#✨│ai-help
1 messages · Page 333 of 1
gpt 5.5 says :If they want the fastest practical fixes, I’d say:
Reduce batch size to 1.
Lower resolution/model size.
Restart PC.
Update NVIDIA Studio or Game Ready driver.
Reinstall Replay/PyTorch environment if it still fails.
Try WSL2/Linux if Windows keeps producing CUDA launch failures.
it could be issues with lack of vram or maybe you are using too much audio?
your theory is weird
the goal of the pretrain training is to make a model capable of predicting how a spectrogram would look like for a given speaker, phoneme, pitch
properly trained pretrain should be able to infer (phonemes + pitch) from original audio + speaker vector into the original audio close enough
and it should be able to make a good guess of how a different unseen audio would sound if it was said by a known speaker
when you finetune a pretrain using a new speaker it realigns into predicting how any input would sound with the new speaker's voice
chatgpt's take
generally what happens is that the model overfits and loses ability to predict anything but the content of the finetuning dataset
moving replay to another drive seemed to do the trick
program works fine with default voices, whenever I upload a custom voice, and swap to it they dont work then the client freezes, if anyone knows this issue? no error in cmd, just ends with...
[Voice Changer] Loading index...
Try loading... model_dir\5\added_IVF256_Flat_nprobe_1_CyreneAidenDawnHSR_v2.index
Version:
MMVCServerSIO_win_onnxdirectML-cuda_v.1.5.3.18a.zip
what you're using is extremely old
default voices only come with very old version of the voice changer
For me no matter the steps it always gives the "overfitting" results
Predicting b to b right but not a to b reliably
This is a General AI Discord Server, please elaborate:
- your pc gpu
- your pc os
- what are you trying to do: TTS, AI Covers, E Girl Trolling / Catfishing or Roleplay
- the tutorial link used
This is a General AI Discord Server, please elaborate:
- your pc gpu
- your pc os
- what are you trying to do: TTS, AI Covers, E Girl Trolling / Catfishing or Roleplay
- the tutorial link used
What voice model are you using, what gpu do you have? (Nvidia or AMD) and what are you wanting to do with each
do u have to upload your model to hugging face
This is a General AI Discord Server, please elaborate:
- your pc gpu
- your pc os
- what are you trying to do: TTS, AI Covers, E Girl Trolling / Catfishing or Roleplay
- the tutorial link used
Do you really need to use docker?
Sure.
weights Can't you use it?
chat i need a repo id which supports text generation
where do i find it
im completely new to ai stuff btw
weights is dead
what voice changer do i use
What is your PC GPU? And what do you use the voice changer for?
its rtx 4050 6gb i wanna make my mic sound bad and crispy
For simple microphone audio effect, you could use a simple voice mod program. However, Vonovox and Tg Develop's W-Okada are AI voice changers, have FX options, but Vonovox gives better base audio quality.
how to get vonovox
Download Vonovox. https://huggingface.co/dr87/vonovox/resolve/main/Vonovox_beta_17_11.zip
Hola, como hacer un audio con la ia de bad Bunny 2022 UVST
thanks
What is this Bad Bunny about?

This is the guide about Vonovox. https://docs.aihub.gg/realtime-voice-changer/local/vonovox/
Last update: March 30, 2026
Use WinRAR or 7-Zip to extract the zip content. Inside Vonovox folder, double click on start.bat to launch the program.
what about file explorer
i will be back when it done extract
hey is there a way to setup playback on vonovox?
its taking a bit long to do the warming up voice conversion
what do i do\
when it was my first time opening it took forever but after like 1 12min vid it started working
are you looking to make it bad for a simple effect or make like voices more realistic?
youtube videos are outdated asf for RVC related stuff, don't suggest them, what did you use?
no no i used a random vid to pass the time while it opened
the tutorals i used were from him its working fine now i just dunno how to setup playback for the vc
like rn im using the 17_11 vonovox beta
please elaborate:
- your pc gpu
- your pc os
- what are you trying to do: TTS, AI Covers, E Girl Trolling / Catfishing or Roleplay
- the tutorial link used
i meant "here" but it was one Your_Local_Worm sent on the discord.
Nvidia 5070
Win 11
Roleplay/jokin with friends
sorry* im still a lil tired so i keep messing up my words
Guys, can you help me pls?
[Voice Changer] warming up... generating sola buffer.
got this thing and nothing coming after
This is a General AI Discord Server, please elaborate:
- your pc gpu
- your pc os
- what are you trying to do: TTS, AI Covers, E Girl Trolling / Catfishing or Roleplay
- the tutorial link used
This is much better looking, good job
thx, I merged all the help commands into this, we had the forgotten howtoask and an help template lol
Lol
That does look good
thx, btw I was a mod in weights too and don't remember you there (I was more active in AI HUB so maybe it's that), but also thanks for the recent help
I think that Weights role might be deleted one day 😭
Maybe could keep it as nostalgia
just checked and it's just a cosmetic role, so there isn't the risk that one day one of their account gets hacked and that it gets abused lol
My role is the greatest thing ever lmao
I honestly dont remember who gave it to me but somebody did
Alr lol, but any links referring to the site should be removed besides a download to replay at least
are u a staff at moonwake now btw?
Have they fixed the issue with the login thing where new users cannot use it or do you know
Yes, you could say.
What's moonwake?
Also i believe that has been fixed, the latest replay version doesn't have the login thing as far as i could tell
Weights stuff was already removed from the docs (except the security alert at risk), but most things should be removed here soon, i think just the model maker bot needs to remove it which I don't manage but it's not that important
goodluck, how's it going? I remember checking it and it was less active to DreamTavern, I mean AI HUB isn't that active either
DreamTavern got rebranded
It's going alright, i was told not to talk about DreamTavern so i cant say much about it on that topic lol
ohh alright, have a nice day
maybe let's not clutter the help channel, my bad for talking here
Yeah, fair enough lol.
the ai image generator If I can remember right?
Is vonovox and wokada forks web ui only? or can it be run locally like the original wokada. i tried both and they launched the Web ui.
Original Wokad was also using Web UI, just in its own window
Vonovox doesn't use a Web UI, while Wokada Tg Develop does as its a wokada fork
Is it that important to you?
Btw its better you fill the help template form
- Goal : Educational purpose / Roleplay
- Specific Issue: Having have major audio issues and random high pitches, audio not being picked up properly (choppy) experience with vonovox.
- Full GPU Name:RTX 4070ti 12GB
- Operating System: Windows11
- Tutorial Link used: None / Previous knowledge.
for the voice models that are just a model, what am i supposed to input as the index for RVC?
leave it empty
will it work like that?
yes
oki ill give it a try ty :3
np
you could add the index of a different model, I dunno how that would effect it tho
what does the index effect?
its not worth it
i usually have the index setting turned all the way down on every model i use anyway
i had another question, is there anyway to increase the quality of the voice that isnt the chunk?
lets say that when you make the model we make a map of how the target speaker speaks and we save it in a idnex file, that is used as a cheats sheet by the model to "remember how certein phonemes sounds"
upping the chunk makes the voice sound better but it takes soo long to translate
you can use extra convert size
same here
do u have AMD or Nvidia
if u have Nvidia u should use vonovox, it's the current best for realtime
no, language is complex and takes a certain amount of time to recognize speech properly in the first place
oh okay i see i see, so theoretically like i could use my own voice if i knew how to make an indec and it would sound better when i use it?
i have nvidea
use your index with the model of someone else voice?
i think i use innxdirectml
no that wouldnt sound great
cuda
where did you download the voice changer from btw
no like with my own, or am i not understanding it properly
so the model trained to output your voice and the index of your voice?
think i already use the max, then between 400-5000 chunk
*500
i think from here? not sure bc i downloaded it a year or two ago and just started using it again
yea i think so
more than a year, probably either the original you're using or deiteris, I'll get u vonovox
why is vonovox better?
i mean yea but why would you want to convert you in you?
ohhh i see what u mean now
i think i get it
so i can just use the model and it will convert my voice anyway
like the index doesnt help much with that
the index for the most of the time is not worth using it
better quality, I believe less delay, it's the only one still receiving any updates or improvements, pitch detection and other stuff is way better too
yeah on the UI i have it on 0 so
oh okay cool!
shame that it isnt open src
true, but I believe it's not due to the patreon thing dr includes for optional support funding it
do i just download these and launch them?
yea! the second link is a virtual audio cable, like vb cable, it's recommended over vb tho,
yea, pretty much
and for vonovox just run the start file
wassup
OH okay i see, i just extracted it do i just launch the setup?
so since you have the model maker tag, i always made my models splitting dataset in clips of 3 sec to 5 sec, is it bullshit to split them in 400 ms or 300 to "optimize for small chuncks" when going realtime?
never really tried that
setup64, then install driver for vac lite
the virtual cable
64 over the regular setup?
don't split up your dataset manually applio does it for you
yea that's the correct file, idk why different ones exist
i do know but i do it anyway because when loading datasets of 30+ hours ram cant take it
yea but in princible
never tried small chuncks?
always used the applio 1 sec min to 5?
is there anything i need to worry about like having installed or my pc specs or whatever before i launch vovonox? sorry im not that well versed v.v
you should be safe
i got a warning about not having smth installed in the cmd as it launched
post it
the cmd ent away like soon after, ill try launch again
Warning: psutil not available, cannot set CPU affinity
wait
oki
you runned setup.bat right?
WAIT
i might be just stupid
i was trying to use the launcher
not start.bat
LMAO
its alright
i feel like a hacker rn
you will need it
befre u go
yk?
can i pretty much just import all the models ove been using straight over to this?
yes
okay sweet
okay ty :3
well in that case
for the vac
does this mean its going to become my defulkt audio device ?
okok
so you need to go here, click on audio in the control panel
then simply select the audio device you had before or the one you always use and click set default wich for me its predefinito wich is italian
yes
will i need to do the same for my mic?
yes
okay sweet ty sm
np
hear the mdoel output?
Sorry for disappearing btw I had to go to the store
its oki hahah
just set the audio output to your headset
oh yea i see
Then you cannot use it in games tho nobody but you could hear it
yea indeed, its playback
oh ok i see
just chnage it later to your virtual cable
so there is no way to do both? bc on my old one i was able to
just do this for your virtual audio cable
imput of cable to output and in game just output of virtual cable
and you will hear it
Go to control panel, audio settings, do this
you can also do this yea
and then thje effects i shouldnt have to mess with right?
there is no need tbh
maybe
but the backend and sample rate etc is there any best option?
ok cool cool
48k is fine, most default settings
Block size and pitch is all you'll need to change
sample rate use the model output or if you use the upscaler us 48khz
upscaler?
oh i see i see
(i raccomand not really using it)
the chunck
Just a pitch shifter, only for fun
yes
what
oh okaty, assu,med that was just thje p[itch
Pitch is for making it sound correct for your voice, formant is a pitch shifter
Male to female 3-12 female to male same but negative
Male to male 0
Same female
Depends on if you're a guy or girl and the model being female or male
That's for pitch
okay sweet
Block size (chunk) should be fine at 0.30 or 0.35
is higher better qaulity? i assume
ye
sweeet
till a point tho
yea
for me some models higher would make it sound a bit odd
oh really?
but for me its always a pain to get them relaible on my voice
Depends, from testing the models get weird after maybe 0.50
@viral mason , all the models you did that you tested on your voice did they sounded actually reliable or they would produce garbled on some imputs?
or perceptually bad audio
Yea they're completely reliable because I made them lol
Do you have a noisy environment or people talking/tv in the background
nope
Anything that could cause it
sometimes mine picks up my keyboard
Same
What voice model are you using?
my physical voice
doesnt help i SMACK my keys ig
at infer as source
You made a voice model of yourself?
no
what i mean
is that any vocie model if i convert my vocie with it and i speak naturally it always ends up on some wierd soudns sometimes
unless the target is very similar to me
btw i am not a native english speaker and my english could be a bit broken) but using same acent speaker targets and gettign same results is ass
i know mine does that
i have an australian accent so
some words sound really odd
and also if i laugh it alwasy sounds like deoms
and i might have found it
rlly?
i mean non verbal are always a pain
yea but its not trivial
yea haha
i cant imagine it would be
and requires quite some work
Laughs are not possible really unless a model is trained on it
my best bet has just been to adjust my voice when i use the rvc sand have my mic closer
i mean in theory its easy but requires a lot of work and i dont knwo hwo relaible it is right now
yeah makes alot of sense
kinda
If it's not in the dataset it'll usually sound off or just not natural
with my trainign method the dataset can cointain very few of them and maybe sound fine
Yea as long as it's in there should be fine
but for me (to my current looks i gave) the problem is that finetuning a pretrained shits what makes the model "speaker invariant"
?
Not really, I'm not into code I just know how to make a good model
oh well
the TE is what paints how a phone and pitch would bee if spoke by someone
I hope real spin is implemented to applio soon
its like taking a pic of a car and slapping a red filter on it or a blue one
even if contentvec is not perfect
the content is pretty much the same
and if you finetune the TE on the target then when sum else uses it that person is not the target
if it goes out of distribution the decoder struggles or even if the output latent is wierd because its not sure how it would be
i will test this better soon
fr
is that much of a game changer?
i always trained “vanilla” and got similiar result to “improved” versions
@fringe heron @viral mason ty sm for helping me out im having so much fun playing around with it :3333
The real spin won't have the timbre leak and other weird issues, lyere could explain
:)
understood
applio's spin is not the same as the original spin
why?
different projection and doesnt use pred head
gotcha
so it behaves like a worse contentvec lol
ok to be fair, not worse, but the quality of the feature extraction is not as good as the original spin
mhhh thats why my result was bad
btw i also realized this, you said you tried freezing the TextEnc but didnt notice improvements, maybe you kept the posterior enc free to update wich maybe caused a easy path to be "speaker variant" again and saw no improvments. Even if thinking more about it would only cause more problems freezing it.

where do i ghet the voicechanger
Please can I get a link to set up vonovox…. On a nvidia 4060, 8gb vgram…. What’s the best set up for real time voice changer?
sure, what are u planning on using it for btw
!help-template
To receive assistance, you must provide your system details. Copy and paste the block below into your reply and fill it out.
⚠️ NO INFO = NO HELP
- Goal (e.g., TTS, AI Covers, Roleplay):
- Specific Issue:
- Full GPU Name:
- Operating System:
- Tutorial Link used:
• Check Docs: Many fixes are in the AI Hub Docs.
• Be Specific: Say "RTX 3060 12GB", not just "NVIDIA".
• English Only: Keep all discussions in English.
• No assistance for NSFW/Porn or ANY Illegal Activities.
• Read the [Full Guidelines](#1402790586028789830 message).
!help-template
To receive assistance, you must provide your system details. Copy and paste the block below into your reply and fill it out.
⚠️ NO INFO = NO HELP
- Goal (e.g., TTS, AI Covers, Roleplay):
- Specific Issue:
- Full GPU Name:
- Operating System:
- Tutorial Link used:
• Check Docs: Many fixes are in the AI Hub Docs.
• Be Specific: Say "RTX 3060 12GB", not just "NVIDIA".
• English Only: Keep all discussions in English.
• No assistance for NSFW/Porn or ANY Illegal Activities.
• Read the [Full Guidelines](#1402790586028789830 message).
we don't need 3
What is your PC GPU? And what do you use the voice changer for?
Is Vonovox having issues? It lags when I play games, and I'm using a 16 GB RX 5070 Ti.
My internet connection is pretty good; the lag only goes away when I close Vonovox.
- Goal (e.g., TTS, AI Covers, Roleplay): just wanna sound like anime characters
- Specific Issue: dont know what to download i wanna download voicechanger
- Full GPU Name: Nvidia Gforce rtx 5070
- Operating System: win 11
- Tutorial Link used: https://www.youtube.com/watch?v=81KYc8AAmus
Vonovox and Tg Develop's voice changer are only known voice changers that can work with GeForce RTX 50 series. 
could u send me the link please
Download Vonovox. https://huggingface.co/dr87/vonovox/resolve/main/Vonovox_beta_17_11.zip
So why is it lagging?
Can you send the full screenshot?
any1 got a mobile voice changer that’s not bad

- Goal (e.g., TTS, AI Covers, Roleplay): Extended Pretraining of base rvc v2
- Specific Issue: Not really sure how to do this, any guidance? Do I just plug in the D and G models into applio? A CLI option for this would be great..
- Full GPU Name: A100 80GB
- Operating System: Linux (specifically debian)
- Tutorial Link used: N/A
Goal, get good tts be it local or using providers like eleven lab
Silly tavern and marinara engine has little to nothing guided and etc on tts
Gpu is rx6900xt 16gb vram
Os is windows 11
None
@shut yoke don’t promote
Wait huh did something happen
There's no realtime voice changer for mobile
? Someone promoted
I wasn't talking about you
Oh my bad I saw original message deleted so I thought that was it 🤣
It's more of an advanced thing but you could check https://docs.aihub.gg
Last update: April 19, 2026
!help-template
To receive assistance, you must provide your system details. Copy and paste the block below into your reply and fill it out.
⚠️ NO INFO = NO HELP
- Goal (e.g., TTS, AI Covers, Roleplay):
- Specific Issue:
- Full GPU Name:
- Operating System:
- Tutorial Link used:
• Check Docs: Many fixes are in the AI Hub Docs.
• Be Specific: Say "RTX 3060 12GB", not just "NVIDIA".
• English Only: Keep all discussions in English.
• No assistance for NSFW/Porn or ANY Illegal Activities.
• Read the [Full Guidelines](#1402790586028789830 message).
I don't need to know it's advanced lol I checked the docs, nothing much was said there. Just the same advice,, use multi-speaker on applio, bla bla bla, do I just plug in the generator and discriminator from v2 and just chug on?
if you use D/G weight to train a model, it is called a finetuning
you can finetune using a single voice.. or using 100+
if you dont use D/G weights and train using 100+ voices, it is called 'creating a pretrain'... nobody has been successful at creating a good one from scratch yet
.. How come all these pretrains like Titan etc crop up then? Those are trained ON rvc v2 aren't they? That is what continued pretraining is no?
Goal, get good tts be it local or using providers like eleven lab
Silly tavern and marinara engine has little to nothing guided and etc on tts
Gpu is rx6900xt 16gb vram
Os is windows 11
None
they are trained on top of OG pretrain... usually without the same variety, so they kinda suck
iirc finetuning is single-speaker isn't it? + it doesn't necessarily "teach" phonemes or language or necessarily new data towards the model
hmm ive heard
good words abt these pretrains
still, could you lmk how to do it?
docs aren't very helpful this time
do you have 100+ different voices with a wide range of phonemes in their respective datasets?
with equally good audio quality?
if yes, then you can re-tune the og pretrain
yes
i have around 150 - rvc v2 was trained with around 108 or so? does applio cover the extension of the tensor in this case?
Challenge accepted?
like this
and use this new g + original d
hmm does applio do it itself? or do i have to run this on the generator
you need to run this script and match the number of speakers in the prepared dataset
Applio can only adjust when you try from scratch
kay, then do i just plug it into applio and train on the g and d?
you use the extended g and original d as a custom pretrain
if you're training locally, you can simply run a .bat file
env\python rvc\train\train.py VCTK_32k_SP1024 1 20 rvc\models\pretraineds\hifi-gan\f0G32k_emb129.pth rvc\models\pretraineds\hifi-gan\f0D32k.pth 0 16 32000 False True False False 5 False "HiFi-GAN" False
like this
I'm assuming I replace the obvious values like 32k and all with my actual sample rate?
yes
many thanks! any explanation on the rest of the arguments?
model_name = sys.argv[1]
save_every_epoch = int(sys.argv[2])
total_epoch = int(sys.argv[3])
pretrainG = sys.argv[4]
pretrainD = sys.argv[5]
gpus = sys.argv[6]
batch_size = int(sys.argv[7])
sample_rate = int(sys.argv[8])
save_only_latest = strtobool(sys.argv[9])
save_every_weights = strtobool(sys.argv[10])
cache_data_in_gpu = strtobool(sys.argv[11])
overtraining_detector = strtobool(sys.argv[12])
overtraining_threshold = int(sys.argv[13])
cleanup = strtobool(sys.argv[14])
vocoder = sys.argv[15]
checkpointing = strtobool(sys.argv[16])
tysm! i was genuinely writing my own rvc train implementation and it was hurting my head so bad haha 😭😭
Do you have huge differences in AI use in your teams? Some are eager to learn new things, some are still about to explore ChatGPT... how to bridge the gap?
so they would need to adjust the script to finetune correctly with expansion on rvc v2? Would that get added in the next applio version?
normal users do not finetune datasets with 100+ speakers
if you use less than 110 you'll be fine 🙂
do you think that script might be needed to get added in the ai hub docs? It’s a pretty rare case
Can any 1 guide me throughly on using Flow
please do honestly, ai hub docs are really sparse when it comes to niche stuff like cpt
very unlikely
but if you want to, go ahead
I just wanted to share the thing. There's a strategy where if your laptop doesn't have a dedicated GPU, you'd go for an online service like Kaggle or Google Colab, so it should make the voice changer to work perfectly in theory. Actually, when I do the same as this current 2012 Dell laptop that has second gen Intel Core i3 CPU, the audio still stutters because CPU has to process audio stream and other programs at the time, usually unbearable in actual runs.
Goal, get good tts be it local or using providers like eleven lab, fish audio for example
to get Silly tavern and marinara engine with working tts
Gpu is rx6900xt 16gb vram
Os is windows 11
None
So, if I actually have to run the voice changer, at this point I'd buy a new PC with a GPU instead. 
I don't feel like I need help about this one, I already know how things work, even if I rarely run the voice changer for anything other than as a dummy program.
Hello can I post a repo to help me?
I did everything I saw online right but I’m still getting bad lag issues
Obs for streaming
What is this repository?
Hi. I have been using a voice model for a while now (with vonovox beta, without any index file, because there was no index file attached to the voice model when i downloaded it), but its quality is sometimes not the best, is there anything i could do to make it better? Like training the voice more myself, enhancing it in any way? It sounds robotic sometimes. I am absolutely willing to spend time on making it better or to learn new things i dont know how to do now, I just want to ehance it somehow
Why roleplay?
Did you follow a guide or tutorial before?
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
hello, how can i fix my voice chopping of when using a model, its sounds good but every half a second it bugs, i have an rx6600 and a nice maono mic so idk whats causing this issue
What voice changer are u using rn? You could have an older one that doesn't run as good as some of the new stuff
If it's for sure the model and not your mic being too hot or OBS then most likely is the block size being too small If the GPU can't finish processing a block before the next one arrives you get dropouts
How do people literally disappear after asking for help, they must not be in a hurry
what kind of "bugs" you hear?
!help-template
To receive assistance, you must provide your system details. Copy and paste the block below into your reply and fill it out.
⚠️ NO INFO = NO HELP
- Goal (e.g., TTS, AI Covers, Roleplay):
- Specific Issue:
- Full GPU Name:
- Operating System:
- Tutorial Link used:
• Check Docs: Many fixes are in the AI Hub Docs.
• Be Specific: Say "RTX 3060 12GB", not just "NVIDIA".
• English Only: Keep all discussions in English.
• No assistance for NSFW/Porn or ANY Illegal Activities.
• Read the [Full Guidelines](#1402790586028789830 message).
I had the best beans of my life, i guess its kinda annoying tho
W beans
that one is super outdated, what gpu do u have? (Nvidia or AMD)?
k one sec I'll get u the downloads to the newest thing
yup
first one is the voice changer, second one is a virtual cable that connects it to games or discord ect
is VBcable bad?
I actually have to go somewhere rn so I will not be avaialble consistently, for quick setup run setup64 for vac lite and for tg fork run mmvcserversio
not really but it can be weird at times and cause odd mic issues
back home, any problems or anything?
yes i have a problem
let me hear it :3
bababui
im also out rn, so i havent tested anything out
it says an error occured during my voice convo
did you close the command promt?
no
hmm
what are you microphone settings like?
input should be your regular mic and output should be line 1
do u have a model installed?
client doesn't work for me at all but try both
still getting the same pipeline error
on both
[VoiceChangerManager] 'Pipeline is not initialized.'
hello
if i use general questions, in an offline model, how to know that i have the right model for my build? Is there like a sort of response time? For example, asking how much 1+1 is, and i get an answer in 1 sec or in 1 minute ... /: How can i know this?
I'm not sure how to fix errors, <@&1159293204038955078> help this man
yea, what he said
Hey plz where is the site for downlaod vc
do you want to know the right model for your task or the performance for your hardware?
this is a General AI Discord Server, there isn't a single voice changer program
!help-template
To receive assistance, you must provide your system details. Copy and paste the block below into your reply and fill it out.
⚠️ NO INFO = NO HELP
- Goal (e.g., TTS, AI Covers, Roleplay):
- Specific Issue:
- Full GPU Name:
- Operating System:
- Tutorial Link used:
• Check Docs: Many fixes are in the AI Hub Docs.
• Be Specific: Say "RTX 3060 12GB", not just "NVIDIA".
• English Only: Keep all discussions in English.
• No assistance for NSFW/Porn or ANY Illegal Activities.
• Read the [Full Guidelines](#1402790586028789830 message).
@low shard could you help me on this?
thanks
Last update: April 15, 2026
I was explaining you how to properly ask the help request
i guess, a bit of both... i want mainly working with texts (summarising, explaining books, ...) So, while i think in LLM standards, thats not of the heaviest processes, i also don't want to choose a model where i have to wait 2 minutes on my answer
i download this?
it's not something to download, it's usual fixes to your issue
you could check posts about the models, https://lmarena.ai, and https://www.localllm.run/
Instantly check if your PC or Mac can run popular local LLMs like Llama, Mistral, Phi, and more. Auto-detects your GPU, RAM, and disk space.
ok, thanks
Goal, get good tts be it local or using providers like eleven lab, fish audio for example
to get Silly tavern and marinara engine with working tts
Gpu is rx6900xt 16gb vram
Os is windows 11
None
it wouldve been easier with a nvidia card
i do have but it's a latop 3050...
:/
is there another voice changer i could use?
maybe this one doesnt like me lol
Nothing good tbh, you could try Wokada deiteris
It's not as good as tg fork tho
@lusty bear did u check this?
Is this the right channel to ask for pre train models recomendations?
yes
i had some
updates
the model is selected
doesnt have any special characters
did you update everything, select an RVC Model before starting the server, and are using no weird character?
is kanye weird?

could you send a screen recording of you starting the program?
like when i get the error or?
I meant special characters like '£'
no
From when you open the .exe to when you start the server
this is the error
have you checked if the same issue appears on every model?
If so, have you tried re-downloading wokada tg develop?
oh, one thing
the sample rate auto changes to 44100 when i start the server even when i set it to 48000
it might be a corrupted file in the essential files downloaded on first startup, could you try re-downloading the program ?
sure
this one right?
Is this lag normal when I open a game? It didn't used to happen to me before.
Full GPU Name: rtx 5070 ti 16 gb
- Operating System: windows 11 pro
I use the same setup, mine doesn't lag
hi again
have you tried turning the graphics of the game down?
Lower the graphics settings?
Or is it because I have the virtual audio and not the other one they sent me?

I don't think the cable is the issue
try turning down the graphics in game to see if it helps preformance
I also set this up
you don't need these tbh
this one is fine at default
how is default?
why
they're filters all they do is change how it can sound, no quality improvements or anything
And aren't they meant to make you sound more like the character?
no they're more just to help with background noise or for the eq it modifies sound by increasing or decreasing the loudness of specific pitches to improve clarity, or alter the tone to be warmer, brighter, or less harsh.
that last part I took from google
So, if that helps, I have some background noise and, among other things...
So it doesn't require overclocking the CPU?
why would you do that?
are there any google colabs for training voice models that still work? or is Applio the only way
applio is the best way to train, colab is ok but Kaggle is way better
hi guyssss
heloo
whatcha talkin about/what can i help with :3
woah, ai master :O
have you mastered the arts of the shoggoth?
hmmm? what's that? :3
One message removed from a suspended account.
One message removed from a suspended account.

anyway sad to see weights shut down last month, i recorded the final minutes
at least replay is fully functional without needing an account now, only good thing weights ever made was that
Not even needed.


"Weights.com closure" might sound too overdramatic for some, but in the end it was just another website, so either try move on or ask for AI cover program. 
Not needed.

question for yall
whats the max time on a google colab instance or whatever?
is it till i close the tab orrrr
like 4 hours I think
kaggle gives 30 hours per week for free users ❤️
much better
and easier to monitor
4 hours average in a day.
It sucks that I have all these rvc voices saved but no program to make covers with them
All around just sucks
There's Applio RVC. 
have you heard of replay?
Nahhh
I mainly relied on weights
Like 2024 weights
it was made by weights actually and works the same
lemme look rq
Weights has shut down. Replay remains available for downloads, versions, and updates.
YOU FOOL
u go to weights and the download is at the bottom

For real though, Applio RVC can be a bit harder to use than Replay, but Applio RVC can also be used to train a voice model in one instance.
very true
if he is just looking for a quick cover application replay is nice and requires very little skill to use
fun fact, i just downloaded replay to try it out
and see if its any better than aicovergen or an RVC webui
cant post images :<
nvm
THIS FAT PROGRAM LOL
btw guys i recommend "Mem Reduct" its pretty good and can free up RAM
I'm quite skeptical with this "free RAM" recommendation. Sure, the Mem Reduct might work for a low-end PC (if has 8 GB) as it claims, but for a PC with around 12GB or more I don't think there's a point to have it.
You still haven't told about your laptop specs.
- Goal (e.g., TTS, AI Covers, Roleplay): LoRA training - Generation of reference images
- Specific Issue: Can't get QwenVL custom node working
- Full GPU Name: NVidia GeForce RTX 4080
- Operating System: Windows 11
- Tutorial Link used: https://www.youtube.com/watch?v=WRaOsu9TDEM&t=160s
I have tried for days to setup/configure my comfyui environment to be able to run QwenVL custom node and models. Running the workflow in the tutorial (approx. 20:34 in timeline) I get missing packages (accelerate) and incompatible package versions (torch) as well as not being able to find the correct GGUF models. I would appreciate assistance in getting this sorted. I've been using CoPilot and 'he' has been doing my head in going overboard with possibilities BUT he has identified that my biggest issue is package versions given I am trying to keep my environments as clean as possible. Can someone provide a 'pip freeze' of their environment, that would go a long way to helping? Thanks Steve
PS - can anyone identify the 3 x '..._Sub' nodes used in the ZIT, Qwen2512 and Flux Klein groups?
i use it quite often (mem reduct) actually, 32gb ram and 6gb vram on a laptop nvidia geforce rtx 4050
never heard of it
guys theres a worm outside my house
:3
ima go 2 bed
The Mem Reduct isn't really needed for this system of 32 GB anyway. Your "dedicated" NVIDIA GPU is decent enough for simple AI cover, although not preferred to train anything. I've never used Replay, but if Replay couldn't detect your laptop GPU, there's Applio RVC as another option.
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
if you're looking for the voice changer use vonovox for Nvidia gpus but for AMD u can't use Vonovox so use Wokada tg fork
what is ur gpu?
i just deleted the original wokada cuz it says its not good
it's not lol, super unoptimized and stuff
ya just some fun stuff, nothing professional
thank god
so many weirdos join here to play as "girls"
everyone knows that means they wanna scam ppl
they want the method
and we don't allow that here
oki so u recommend nvidia tg fork
i am upgrading to a 5070 ti and ryzen 7 9800x3d soon tho, im assuming i wouldnt use that nvidia tg fork?
for Applio online tensorboard, what do i put for dataset path? i'm running on the public url since the private one doesnt work
gpu is the main thing tbh
what are u running it on?
colab? locally on pc? Kaggle?
I downloaded Applio and uploaded colab to my google drive and ran it there
I am confused
do u have the link for the one we were talking abt
im running this file on google colab
its in the Assets folder when i downloaded Applio
1 sec
wokada tg fork right?
extract the first one then place the second one in the folder for the first one, third link is a virtual audio cable that is used like VB cable to connect the voice changer to discord or games ect
fanks
hey i keep seeing people talk about "crossfade" and "extra time" but i only see block size. is there any reason why?
- Goal (e.g., TTS, AI Covers, Roleplay): Roleplay
- Specific Issue: i keep seeing people talk about "crossfade" and "extra time" but i only see block size. is there any reason why?
- Full GPU Name: RTX 5070 12GB
- Operating System: Win 11
- Tutorial Link used: None
That's because you're most likely using the Vonovox beta, which removes those and have them set at the best option already internally
So those don't matter
ahh i see
oh about block size is there like an area i should keep it at for it to not have too big a delay while still sounding ok
im using your teto model atm
sorry i forgot to hit the reply button
U can try around 0.30 to 0.50
More or less than that it kinda starts sounding bad
0.25 is also ok
I personally find 0.35 good as well
is this the same way while gaming?
Ya
Nope lol, gotta yap to level up
You're very welcome! If u have more questions or need help or anything u can message me here :3
will do and ty again for the help and the model it sounds really well :D
@viral mason
sorry for bothering but I was looking at your GLaDOS model and I saw a fx file for fl studio. is there a way for me to get effects to work on vonovox so i could use it in real time or is that not possible rn?
How's it going?

Are you trans or looking to troll?
What Worm does (if I remember correctly) is route vonovox's output to FL studio and then use output from FL as input in e.g. games
Anyway, definitely possible
Not sure what he does (he's probably sleeping now and will respond later) but I guess 2 virtual audio cables would do the job
Unless FL studio ships with something like that built-in
Definitely higher delay than without FL in between
But opens up room for a ton of effects
And glados sounds great with that autotune
The effect used for her specifically costs money I'm sorry, also you'll need Voicemod, Fl studio, Vonovox/Wokada, vac lite, VB cable, and voicemeeter
I can't help showing the setup atm as I'm already in bed and turned my computer off
It also requires VB cable doesn't kill itself upon use
Tried the same setup I have on my pc on my laptop and the one on laptop explodes and fizzles out
VB cable is strange
Ooh not asleep after all!
If VB cable is a pain then paid VAC offers up to 256 cables so xD might be worth it for someone that needs it often
It's not expensive IIRC
Icky
No money for my setup that's bad for business
It must stay free for the people
The uh, two paid voice effects don't count btw uhmm 👀
They're for personal use on models
I was about to mention them
Too late
xD
Hehe beat you to it
But ye the autotune and ultrapitch are basically to make the voices sound more accurate
Autotune for Glados and Cyn and ultrapitch for anything droid related
Yee I remember the glados example was awesome
:3
I really should remake some of those droids
They're funny
The b1 is hell to go through over an hour of audio from just 1 game
I want more of General Grievous in games but nobody ever focuses on clone wars era anymore
The turrets also are quite silly with autotune, I've recently made the effect more tame over time since it was too much before, u can see the change over time especially for glados
you're welcome, seems like some of the files didn't download correctly at first lol
yea
my gpu is 5070 12gb, will using average models increase much input latency on games like valorant
not really if u lock ur fps
but it will increase input latency right?
is it that much latency to consider?
a little bit yeah, but I doubt it will be noticeable if u put fps limit
how about gpu usage? will it use as much as gpu resource available?
on average models
What model?
this one for example
the game consumes only 20-30% gpu usage
is input latency affected if i use in game?
Applio RVC, Vonovox or W-Okada voice changer?
wokada
Which version is your W-Okada?
v.2.2.2-beta
Old and outdated as hell
for example
Vonovox and Tg Develop's voice changer (b2397) are only known voice changers that can work with NVIDIA GeForce RTX 50 series so.
U should switch to Vonovox, with your gpu it'll run much better and it's the current best real-time voice changer
oh alr
so this one
i need an answer for this
That's basically another RVC voice model. Almost every RVC model works pretty much the same. The audio latency depends on not just the settings.
thanks
yeah i gonna research it deeper later
but before that
i have to know
using voice changer will increase input latency or not?
Like Namari said it's just another voice model, doesn't affect how the voice changer would work
i dont matter audio latency
That's not how they're supposed to
input latency from mouse i mean

i thought when i use high graphics push gpu usage further make higher latency
so using voice changer would be the same way
voice changer also push gpu usage further
Don't use high graphics when using a voice changer and gaming
It's not good for either program
Nah, you're fine as long as you're leaving reasonable headroom and not bottlenecking
valorant usage on gpu is only 20-30%
so this is totally fine to use voice changer beside
is this right?
Most likely yeah, if your usage is up to 30% then you definitely already limit framerate
Should be good
But anyway, just run the voicechanger and if you still have a bit headroom it's all good
If not, lower the settings or further limit the framerate
Anything that will lower the load where it struggles
oh thank yall for answer
i found sometime the gpu jump up to 50% in combat
anyway to limit gpu usage of voice changer?
so i wont get laggy on combat
Make sure to try Vonovox.
ah i see, tho, if possible whenever you get the chance could you show the setup or explain it in more detail if i have the chance or time to do it?
bet i will try
Whats the easiest way to train a voice model besides Applio?

Sure I could show it at some point, the whole thing is just kinda a lot of vac and VB cable connection to many things but it does cause latency increases
What is the best free rvc to use rn
nice ty so much :D
No idea.
Wdym? For making rvc models or for realtime use as a voice changer?
Also what gpu do you have (Nvidia or AMD)
And what are u wanting to use it for?
Nividia
Real time
Trolling
Aw :c
With friends
What kind
What
Why trolling? 
just realized your pfp was ENA!! Peak
Ye I saw this art on insta and thought it looked very beautiful
The full image is very nice
What is the best free rvc to use rn
You should elaborate on the previous statement you made
What statement
sick i dont see many BBQ artist on my insta
for me its usually vocaloid stuff
Why would you want to use it for trolling, what even is it for
Why are you asking?
Is this any of your business?
If u like a few Ena posts you'll get flooded with it eventually
Because if you read the rules you might know that if you're planning on using those e-girl models for such is not allowed here
maybe just to see if ur gonna be doin too much (bad stuff) with the vc
I am not
Then explain
Was gonna use markiplier voice
You could say that.
I'm not a mod sadly so I cannot help with that
But the best Nvidia voice changer to use for free is Vonovox
why are the egirl models posted here instead of a different area it seems counter productive no?
Got download link?
It's not technically breaking any of the servers rules to make them sadly
So we just go like eww to whoever makes the model and move on
If I was a mod I'd delete the links to all e-girl models immediately
If the weirdos need them so bad go find them yourself
It should be posted above, just scroll a bit to find it
now where would teto, miku, and vtubers + every other (technical e-girl) go? wouldnt they be included?
i say technical e-girl because arent there some that are quite literally just a female voice
That's different
like i think callie? idk
They are actually cool unlike "whispery mommy girl voices"
Sorry if I'm missing something. But why this question?
it was when local said if he could he would remove the other e-girl voices
and i more than likely didnt understand what that truely meant cuz isnt an e-girl just a girl on the internet
but then he went further to explain that he meant the "whispery mommy" shit
which tbh is understandable
It's like comparing these two
Second image taken from a post here in voice models
First one is cool and very interesting, second one makes my skin crawl
There's a difference between an "E-girl" voice model that basically sounds like an actual woman and those funny Vocaloid/UTAU voicebanks. Hope this clears up a bit. 
While they might both be girls one is definite and real and the other is bait
hmmm what about people the person who voices the shark girl in ZZZ ( i cant remember the name) would that be in the second or would that still be in the silly area?

Just look up girl in the voice model section, you'll see the difference in what I'm talking about
Who is this shark girl
hold on lemme find
More like Ellen Joe from Zenless Zone Zero.
YEAH HER
really? oh you know that does make sense now that i think about it
yea
all the vids of people going in cs lobbies or other games with a "e-girl" voice changer
we used to have a commissions channel and some people would like center their "shop" around egirl voices
the old server owner also made a egirl voice
in #1175430844685484042 there was a egirl tag
Ick
a now very bad and not properly made pretrain was made to make them better
This makes me sick ❤️
you used to be chill with it 🙏
mmm ok i was just curious cuz it was reminded to that egirl voices were looked down upon
By most of the server yea
maybe you didnt say it but you certainly didnt have this much distain for them
The many many weirdos joining and asking over and over to troll has made me dislike them more
They are all the same too, no personality
Just freaks
well one more question before I go that isnt a moral one
is it better to use models with a high amount of epochs or low (high= >150 low<= 100) i dont really know where it truely is high or low so im just airballin it
what does it mean?
Just how many times the model saw the dataset until it sounded good to whoever made it
oh thats actually pretty cool
A model could sound better at 100 but somehow get worse at 130
And then better again at 150
It's funny
ohh like that ai walking experiment where that starts at gen 1 until whatever and starts walking but then could just start flopping outta nowhere
ok well ty again :D
when I click run-applio.bat how long should it take for the window to show up in the browser?
Idk I use applio on kaggle
alrighty imma prolly study for my final exam soon until then
Ah school, trauma is already flashing through my head like war
I'll get to business then before I start having a mental breakdown
ye im studying mechanical engineering so it truely is hell
lol
How do i get it to read the dataset in this path? I am using https://colab.research.google.com/github/iahispano/applio/blob/main/assets/Applio.ipynb and running on public URL
<@&1159293204038955078>
make sure to use that ping if you need help quickly
ah thank u
@viral mason is there tutorial for ut
no all tutorials on yt are outdated and do not use the new stuff
for vonovox just run start
and vac lite run setup64 then install driver
heyyy
Hiii do u need any help?
yehh'
what with?
Hey, can I ask about co-founders here?
Hey guys, do you know any free AI detector and humanizer tools? Not paid one tho,
oh sorry no
I don't maybe someone else does tho
@soft karma what in particular are you looking for in rvc?
I saw ur messages in ai chat
yeah, just asking tbh. I don't know if anything really changed or if is there another model like it
it's just fun to hear specific characters singing or saying something i want
i've heard the TTS space is getting a lot, but not sure if what rvc does have a better alternative
only new improvements really is applio being what is used for training, two new good pretrains both being legacy core 1.5 and 1.6 as well as the pabp which is also decent, as well for realtime Vonovox is the best (Nvidia only)
@viral mason #1175430844685484042 How can I access this channel?
can you not view it?
It's not showing up; nothing happens when I click on it.
hm, i see. Not much then
still, where can i see all that? want to fiddle with it a little
@viral mason Can you help me? I can't view the channel.
idk what would cause that, did you try closing discord and reopening?
how exactly? I'm unsure what you're looking to use
I turned the computer off and on again and it fixed the problem.
Thank you
you're welcome
what rvc does, basically
speech conversion
change voice of speech/singing based in a voice model/reference
it can be ran either locally on pc or on browser like kaggle or google colab
yeah, i remember using it in the past. Even training some models. Just wanted to ask if anything better released since then
how do i fix Applio port issues i just get "Failed to launch on port 6969" etc
rvc is still the best speech to speech out there sadly
are you using it locally?


