#✨│ai-help
1 messages · Page 327 of 1
gotcha, ty
Idk if my settings are good, I'm still struggling with delay.
hello if the model i found and use did not have an index file included, can i use an index file from another voice i downloaded? can the two work well together?
try turning on AP BWE 48k Upscaler
and use rmvpe instead of fcpe
bro???
there's still like a good second of delay.
<@&1159293140440723499>
Weirdo 😭
I hate those things 💔
Yea!
i mean that's pretty good, if you want I can give you settings to lower it kinda sacrificing quality
yeah, sure
F0 method: fcpe
AP BWE 48K Upscaler: off
Block Size: 0.10
Close every other program in the background
but, you can't get "0 delay", that doesn't exist, 1 second was already good enough
clean curve is more robotic sounding becausing when sighing its mostly just "white nosie" (air) on a graph this looks very fuzzy broken line with a bunch of random dots
screaming creates overtones and distortion. the pitch tracker sees a 5 or 6 different lines at once and doesnt know which one to follow
humming often have "vibrato" (shaking) or very low frequencies that drop off the bottom of the map
so basically the reason why clean curves make u sound like a robot
Standard pitch trackers (like the older PM or Harvest methods) are programmed to force your voice into a "clean curve" by deleting the "messy" parts.
It sees the air in your sigh as "trash" and deletes it
Result: Silence or a robotic "click."
It sees the raw energy of your scream as a "glitch" and tries to flatten it into a steady note
Result: The "Reading a Book" sound.
^^ RMVPE
FCPE doesnt try to force a clean curve it is context aware meaning it looks at the "messiness" around the sound and realizes its not a glitch its a human being breathing so therefore it maps the "messy" data more accurately so the AI knows how to produce a "messy" (human) output instead of a "clean" robotic one.
^^ FCPE
@viral mason (lots of thinking btw)
ty! what you've said has really helped me understand more about the two and other things I didn't even know about
I will do some testing tho to see what I like
@kindred surge , @deft pewter
btw the better the model then the better it will be! if u use a trash model it will still work but wont be as good quality.
what happened to those two accounts?
discord disabled them
true, that's why I train my own, been doing it since like 2022/2023
first account lost my 2fa since someone stole my phone, second account got banned aka suspended due to a server i haven't touched which i forgot i had a bot that i made which deletes alot of channels and recreates channels that have messages sent in etc, basically a server renovator which u can use to code a whole server with i just added hella shortcuts and improvements so i was able to optimize a server that is worth 11k lines up to being the same output just instead 100 lines
of code
got banned because of that stupid reason which had no other members other than me and the bot lol
you can friend me if you'd like idm and anyways i have to sleep, good night!
goodnight new friend! you seem cool so I will accept
nice game favorite xd.
-# mines Death Note: Killer Within
anyways srs gtg to sleep.
night night
yea I pretty much live in vrchat and recently been getting really into minecraft (modded) with a friend of mine
Is there anyone who can do this? By sharing it as a request.
I have a question. I am trying to use TTS Voice Wizard and Okada together. I managed to get TTS Voice Wizard and Okada to work, but my last remaining problem is routing the output of Okada to Discord. The easiest solution is to use a separate virtual cable... but as far as I am aware you can only get 1 virtual cable through VB-Audio... unless you pay for more. But I've heard you can use VoiceMeeter Potato's Virtual Input/Output to route the audio from one application to another, but I have no clue how to do that. So, I am here. Wondering if anyone can help me with my current predicament.
Edit: Oh I almost forgot.
*AMD Radeon RX 5600M (surprisingly not struggling as much as I thought it would)
*Windows 10
has anyone explained the difference between RMVPE and RMVPE+ in here?
discord search eefs up the + part so I can't search
aihub docs doesn't explain it in detail
I haven't seen anything about RMVPE+
I don't know if that's a thing
I tried using the two softwares together it might be impossible
I tried voicemeeter + vac lite + vb cable
and also having fl studio and voicemod
nothin

Maybe I will have to pay for a separate virtual cable after all... I mean, it should work, right?
I wouldn't do that

try asking a helper maybe like the one online that I cannot pronounce their name
just ping the role they should be able to try
I just wanted to use your GlaDOS voice model alongside TTS. Why does the world conspire against my simple wish 
I just use it in realtime with a autotune effect from fl studio
That's the easiest and most correct way of using it, yes. But I've gone through all this chaos to make the TTS sound like GLaDOS lmao. (Actually do you know any other TTS application that can work alongside the voice models?) If there isn't, that's fine. I guess its finally time I give up being a coward and actually use my voice
... which would probably be a good chance of pace
sadly I don't know any software that can do tts and connect to wokada/vonovox
example of it in realtime
You might just've convinced me to use VC normally. Thanks
lol you're welcome
btw if u want the plugins I used the autotune is kinda pricey but worth it for glados
gform is free
I will keep that in mind. This is my first time messing with this kind of stuff, if I end up enjoying it more than I expect, I might just buy the plugins.
it's well worth it tbh, tho keep in mind the setup requires Fl studio, Voicemod, Voice meeter (I use banana) Vb cable, and Vac lite
all of that stuff is free tho
Well, let me use the snipping tool real quick. I will be saving this for later. Thanks :3
you're very welcome!
you can add me if you want and I will gladly help you ^^
btw what gpu do u have
for realtime AMD can use Wokada tg fork, and Nvidia can use Vonovox and tg fork
vonovox is what I used in that example
Oh like I said it's a AMD Radeon RX 5600M. The thing is still working somehow. I should upgrade the setup pretty soon. I bought this ages ago when I was a little too short on cash
whenever u get the money I suggest anything like a Nvidia 3060 or higher
(If I knew all the tinkering this rusty thing needed from the start, I would never have bought it lmao!!!)
lol
oh. That's surprisingly cheap.
but then again the last time i checked GPU prices was ages ago
i occasionally come here to train some rvc ai voice model, can i know what's the newest collab to train models or where to? and if i have to use a different pretrain for spanish talking voice models?
im training with a dataset of 2 minutes on spanish
normal talking model no singing
oh yeah and id like to know what kaggle notebook you use for training
well, managed to make TTS and okada work on discord. Sounds as awful as you'd imagine. I am really better off using my own voice from now own
might be better to just use TTS with a VAC
what's your pc gpu and os?
Might I ask what a VAC is?
oh wait google exists for a reason
Virtual Audio Cable
Last update: July 28, 2025
you mean like VB Audio's virtual cable?
that's a VAC, even though usually for windows users VAC Lite is suggested
you could check the guide to know more
I am already using the aforementioned VB virtual cable. But i will certainly give lite a try
Virtual Audio Cable and VB-Cable are two different programs made by different authors.
I see
How do i make ai covers?
can I use spinv2 for 48khz dataset? seems like there are no chat logs for this
whenever i speak sometimes its like shaky, like it just goes low kinda robotic-y, i tried changing settings but its still the same, could someone maybe help config?
RTX 4070
windows 64x
trying to use the voicechanger
there are no 48k pretrains with spin-v2
as far as I know
yep digging into the discord and other communities to find that out on my own
but thanks a lot
I just gotta try both legacy 2.9 and og pretrain for 48k and see which is better
for realtime and TTS not singing
FTR spin is not very commonly used because there is no actual spin pretrains, only cvec pretrains finetuned for spin
which is far from perfect
Getting claude to give higher detailed work and higher understanding abilities
I saw people talking about spin having no pretrain yet about two years ago posts and messages and we still haven't got it yet? 😔
the art of training a pretrain from scratch as good as OG remains a mystery
yeeeeep
some people like Lyery have done a lot of work in this direction, but eventually the OG just remains better for some reasons
Ignoring me😔
og 48k pretrain, legacy 2.9 and maybe 1.5 but I kind of know 1.5 is solely consist of singing data not talking
so unless that barrier is somehow passed (details behind OG pretrain are found or perhaps someone does something better by guesswork), i doubt we're getting any new pretrain-from-scratch
it's like I'm in the field of expertise where we lost our technologies in the dark ages
not in a literal sense but yk
IDK really, I've never trained with spin. But in general 2.9 is not really used and (at least with cvec) OG and LC1.5 are most widely used
Lyery even removed 2.9 from this #1235952130855010365
that's what I found out in most messages, but I've heard 1.5 is keep putting vibrato and such in just normal talking due to the training data it had
and even though 2.9 is having issues with cracking, I saw some people recommending it for realtime talking models and TTS
that's why I decided to try it out
can't go wrong with OG
oh the og one uh
but personally I didn't experience the vibrato issue
48k og somehow have a bad rep
and training on singing dataset doesn't mean the pretrain is bad for normal speech
it should do fine
perhaps. Can't elaborate here, I train 32k anyway
time to whip my good ol 3060ti for training the same dataset three times in a row with each different pretrains
yeah, that's always a nice thing to do
actually, for various data, different pretrains might turn out better
these kinds of personal experiences that make me try ALL THREE OF THEM
As i said, usually with training people go for either OG or 1.5. And sometimes one is superior, other times the other one
make sure to test it with samples containing long held notes etc., because IIRC that's where the vibrato is supposed to be hallucinated in
these are the types of situations where I really want just one superior-in-every-single-way pretrain to exist
thanks for the tip
I think we're far from that xD
Training is unpredictable, different kind of training data produces vastly different results, so a single, generalised approach is not the way to go, ATM
actually now that i think of it, i don't think i did that test personally.
I mean only singing tests like that, but in there vibrato is usually already present in the input sample
I guess i should just record myself holding a long note
oh just one more thing before I go, I personally experienced that 32k have clear audio quality decrease compared to 48k models when talking, this has been the same case for my friends to audit via discord claimed 48k sounds quote "almost like as if you changed your mic to a more high quality and expensive one"
should really try it out yourself with normal talking and throw in some long notes
maybe we'll find something interesting
dunno, would be easier to comment on with some audio samples for comparison.
Obviously 48k has a wider frequency spectrum so it is more detailed
But not sure if that's what could instantly be perceived by people over discord as "higher quality mic"
eh one of them have crazy equipments so I'd assume that could be the reason
I would expect it to be more of a "different sound characteristics" case than simply a wider spectrum
but that's just a guess
I don't think 48k has a clear improvement over 32k in terms of audio quality THAT much
From what I heard, sybilants and breath noises are more easily going bad in 48k compared to 32k
I think that was one of the reasons 32k is recommended over 48k in most cases
But I can't speak from experience with 48k because there's almost none
THAT I do experience everytime I use 48k model
trying like ship, chip, keep, steep then say something without sybilants
huge artifact difference
could be the dataset problem but at least I do experience that with 48k model
I guess there is a chance that some dataset preprocessing would help
but then I realize this whole RVC thing doesn't handle high frequency that well in the first place
we even numb some high frequency poking out in the dataset a lil too
Sometimes that's what the workflow looks like for me. Prepare data -> train -> notice the model learned something bad -> process the data differently to hopefully avoid that and retrain
so I guess sibilants could be naturally hard or even impossible to make naturally in any model
yeah, i think the difficulties of 48k over 32k might be a result of literally that. Wider frequency spectrum, more room for high frequency details which can be difficult to replicate properly
(FTR this is just a speculation as well, someone with better experience here could be able to tell for sure)
Honestly, if I did separate audios, void out non-hearable parts of the dataset, thumb down spiked frequencies, clipped every speech within 2 to 5 seconds(maybe 10 if can't be helped)
THEN something goes wrong then I legit don't know what to fix from the dataset at that point
in my case I often just concatenate all samples into one file, then usually go the lazy route and use Codename's smartcutter which finds silent parts and replaces them with ~100ms of pure zeros
and then splitting into short clips (3s i think?) is done as preprocessing in Applio
so that's two of the things you described done more or less automatically
(definitely not perfect, obviously)
I guess I have that autism to not be able to do that and resort to manually doing everything for that matter to not void out any breathing(except heavy breathing or rough breathing)
understandable
the smartcutter was kinda made specifically for this purpose
so ideally it should keep breaths and remove dirty silence
tested it a bit when it was in development and it looked nice
I'll definitely try it out as well
compare it to my manual work and hey, if it's doing better job than me
fine by me
Is selling ai tool allowed?
maybe I'll upload a British accent realtime model when I'm done with this too
could be a far stretch but oh welp
Is selling allowed here?
there's also a debug flag in the script that inserts noise into where it detected silence. That way you can compare side-by-side the source sample to a processed one and see what was and what wasn't considered silence.
Gives a nice overview of how it performs with your data
No, i believe this falls directly in the "advertising/promotion" category which is against the rules
that's really nice thing to have
actually it could go up to the AIhub as well if it's really nice
Simply no.
Is this supposed to be a help question? The way you word sounds more like a comment.
It is a question im working on claude code and its giving crappy results when i know it could do better ive been trying at it maybe i need different prompts
Try ask Claude for more context (which project are you tryna do) and logic before you tell it to generate final results in another prompt, it should help. If there's any related tool in chat, try one of them. You wouldn't always ask Claude (as well as many other chatbots) with something simple like "hey, generate me some code" and expect it to generate that way.
Yeah i mean right now im geting claude to generate its own plan from the trending datas and perfornalism of a high quality working studio team for claude code so it should have the context needed
Im using claude code with VS studios that using mcp to connect to roblox studios
what can i use to make good rvc v2 models for free
(i need them for Replay bc weights.gg shut down)
Applio RVC is another RVC software but it's more manual than automated.
question is using spin v2 better than content?
im hearing that using spin v2 is better when combined with fcpe
no
there's no good spin v2 pretrain, the one I trained from scrach using refinegan is unfortunately not good for realtime and stuff
it needs more data as per dr87
i heard spin v2 is pitch invariant tho
what
like something its better if you dont sound the exact accent of what the person ur trying to replicate sounds
im just trying to say i did search it up and found some ppl who used it say that spin v2 is good for sounding exactly like the model voice?
both for model training and tg devs fork owkada
but im not 100% sure
so i want someones confirmation
Contentvec voice models are more common than Spin V2. Spin V2 ones are more niche, experimental, and rare to find in #1175430844685484042. FCPE is another F0 model used in RVC, it might be faster but the accuracy won't be the level of RMVPE.
fcpe for my experience delivers almost the same accuracy as rmvpe and js faster
also with more benefits?
tbh i see the opposite for my side
but yeah
i'd like to be able to have my screaming, humming, laughing, etc. sound better
What's your PC GPU and OS?
Elaborate the tutorial link you're using
What's your PC GPU and OS
RVC is limited on that
yeah i know
I’m looking for the best AI subscription to use daily for things like random questions, analyzing situations and decisions, & helping me grow a business. I’m also somewhat interested in AI video tools like Sora or Veo but thats not a priority. Any recommendations on the best subscription?
Why are there two fcpe?
i added that myself dw
slight visual bug but eh
I once paid for Google AI Plus (including Gemini and Drive), but I didn't actually utilize the full potential features these tiers offered. 
What is the best rvc voicechanger

What is your PC GPU? And what will you use the voice changer for?
:V Does anyone know how to make a RVC Voice Model? Cause I want to make Ai Covers with them
Inferencing (AI cover) and training (like making a voice model) are two different processes. What is your PC GPU?
Windows or MacBook? On Windows, open Task Manager.
I'm on windows yea
3070 is my gpu i want to test out how realistic these free voicechangers have become
Windows

I opened it, now where do I check my CPU?
Go to Performance tab, spot if there any GPU 0 or GPU 1, one of them could be a dedicated one.
I saw that there is a v2
Of the voice changer
I just know about the w-okoda one
There's Vonovox, an alternative voice changer that gives better audio quality than any W_Okada version.
Does it jas delay
Mine is GPU 0
Like what is the delay of it
Like the GPU Memory?
Just that one? Well, that one is probably an integrated GPU, and of course it's a laptop.
I got a laptop and that is my gpu
What is ur gpu
My 2012 laptop doesn't have any GPU appear in Task Manager, so not ideal to run any voice changer. 
Ok
Imma first try w okoda out and then vonovox
Thx
But even if my laptop had a dedicated GPU, it probably might be one of those NVIDIA GeForce GT 600 series, which is also old and won't gonna work with any AI program.
Check out a specific version of W-Okada (b2397) made by Tg Develop.
K thanks
Have a great day
Why did #🔍│find-models shut down
What happend
I joined like 3 days ago
Look up in #✨│announcements.
K ty
where i can download RVC?
Don't try to ask something simple. What is your PC GPU? Realtime or non-realtime? And what will you use the program for? This is how I ask people here every time.
im assuming their probably here for ai rvc real time voice changer.
considering they joined the server last month + the fact their playing a roleplaying server.
Well, that sounds a bit awkward. 
why awkward?
my gpu is GTX 1070 8gb, I used the program to test voices
But I formatted the PC and I can't get the program
Not enough answers.
🤨
RVC (retrieval-based voice conversion) doesn't always mean realtime voice changer, at least be specific.
Does sm1 have a Vox Hazbin Hotel english RVC model here
Bc all the old ones got taken down 😅
Check in #1175430844685484042. If the voice model hasn't exist yet, you might wanna make a request in #1159289738314919936 instead.
Oh okay tysm! Would u recommend me to train my own or should I request one?
Any choice that you can do.
can you elaborate:
- your pc gpu
- what you're trying to do: TTS, AI Covers, E Girl Trolling / Catfishing, Roleplay
RVC means Retrieval-based-Voice-Conversion but
it's better to get more info first, many confuse RVC with realtime voice changer lol
yh i know
but i searched on her profile
shes playing a gta in a roleplay server
also she included "used to test voices" shows she prob means about real time voice changer.
|=1. i suggest you to use cloud based servers, your gpu isn't gonna handle the voice changer really well when running heavy games so i suggest you use cloud based servers instead.
|
|-2. if your new to using voice changers i suggest you use "Tg Develop's W Okada Fork"
|-|
|-|-2.1. if you want to train a model instead + with a voice changer embedded in it use "Applio"
|
|-3. if its laggy close any other tab and leave only kaggle and the main tab, everything else just close it.
|-|
|-|-3.1. if it sounds choppy go to task manager and find the tab that holds the main tab and set its priority to high ( don't ask why it just works im too lazy to explain )
@lime acorn
it's better to be sure
also the more we chat the more we level up to new roles right?
the messages contribute to levelling up to roles yeah
fire
by the way i assume you know alot about the bot cmds right? can you possibly tell me about them all
-prefix-commands
Commands that give you useful information that work with the prefixes (- or !)
Shows this list
Shows RVC Documentations
Guides for Audio Cleaning and making Datasets
A list of Useful Google Colabs Notebooks (Cloud)
A list of Kaggle Notebooks (Cloud)
A list of HuggingFace Spaces (Cloud)
A list of Lightning AI Notebooks (Cloud)
A list of Programs for using RVC Models in Realtime for Calls/Games (mostly Wokada & Wokada Deiteris fork) with Guides
Shows useful links about UVR, a program for vocal and instrumental separation
Shows how to ask properly for help
Shows how to search RVC AI Voice Models
Shows a very old Google Spreadsheet with old RVC Models, not much suggested though
Shows others that RVC Easy GUI is Outdated
Explains others that So-VITS-SVC is very outdated
A text explaining the difference between Java & JavaScript (for fun)
oh yeah you were an old member
wtf xd
ye i mean voice changer
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
I'm guessing then you're atleast on Windows 10 or 11
are you trying to do GTA roleplay or e girl trolling / catfishing?
Hello!
Do you need any help?
No, just doin some commands. Thank you though!
windows 11, 64 bits, ye gta roleplay
if you need any help let us know :)
then yeah, you can use either wokada tg develop fork or vonovox
okay, where i can download?
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
yep @kindred surge , @deft pewter , (now) @supple trail
hey guys, i just joined this server. think its a good server to learn stuffs about AI being an absolute rookie?
hey so im tryna setup okada on my gfs pc, and i adjusted the extra and chunk akkordingly to what it needs while gaming in order to still work, but for some reason after the pc has been on for a little the voices will somehow always sound like a microwave no matter what we do with the pitch, the chunk is in a very wide margin of range so it doenst stutter, the voice itself doesnt stutter but will never sound like the model at all in game, even when i test the exact same model on my own pc, where it DOES sound good, she has a 3060
which wokada are you using?
tg fork
or det fork
or just owkada
wokada*
seems that swithcing the pitch extraction algorithm almost entirely fixed it
tg-develop btw
does it sound
like choppy overtime after using it?
are you running it cloud or locally.
tried both client and server doenst rly make a difference, changing it to regular rmvpe instead of the rmvpe(onnx) it is almost completely solved
the voice would run smoothly but sound completely weird
like out of the pitch range
alright go to task manager
mhm
ye
switch its priority to high
allows the pc to focus more resources on the voice changer.
alright try using it now
if you still notice choppiness
wait there is 2 differences
start delay vs choppy
ye no it wasnt chopping/cutting out, it was just purely sounding off like a model would when u have the pitch on like -30
which one? (start delay ignores like the first few milliseconds or seconds of u talking then start changing the later recorded voice)
oh.
screenshot the model settings?
so for context, she wanted to use a roxanne from fnaf model, she is a female, so 0 pitch would work fine on that, like on my pc it worked for her voice, we move over to her pc, and the same 0 pitch sounds completely off, and by off i mean if u were to imagine going on my pc and cranging the models pitch down to -40 far outside the models range, where itll sound like a dying microwave, even tho the pitch and format and index were the exact same as on my setup, on her pc
everything at 0
her chunk size is 480 since it wont work on lower delay while playing a game
and 2.7 extra
How can i use whisper remotely but using the Microphone on the client ?
I'm guessing you're talking about Whisper V3 Large for STT, maybe try using a Virtual Audio Cable? what program are you using exactly
Buzz
Hi Everyone, I have a question. I am looking for an AI bot to switch to. Currently we use ChatGPT 5.4 at our company. We have spent 3 years training in in our thought process and writing. We want to be able to give employees access without rewriting or changing its memory without approval.
So we are looking for a chatbot that can do 3 things.
- Move the memory we have developed in ChatGPT over to the new model.
- Allow employees to have permission to ask questions, but anytime memory is tryed to change, it will not change unless it is aproved by the managing account.
- We want multiple people to be able to use the memory and chat at the same time.
- We also need it to be able to scale, as we grow fast.
Any advice?
how do i set up the cloud server, or is there a way to make it so local isnt extremely choppy and laggy and delayed
This is a general ai discord server, elaborate:
- your pc gpu
- your pc os
- what you're trying to do, like AI Covers, TTS, RVC Training, E Girl Trolling / Catfishing, Roleplay
- the tutorial link you're using
<@&1159293140440723499>
guys what are the best voices???
How do i make something sound more like the ai model than rely on the audio?
Bc its like 70% accurate
Can somebody please help me set up
I have a Gaming Laptop Katana A17 AI B8V.
So I should be good, but it dead never works for me I never hear no audio
Using the RVC Voice Changer?
I watched this Tutorial : https://www.youtube.com/watch?v=O3Q2urZSgM8
Can someone tell me if the following config Is accurate ?
This is a sophisticated "Remote GPU Pipeline" architecture. You are essentially offloading the heavy lifting of AI transcription from a mobile device to a high-performance home workstation. By using Buzz (a GUI for OpenAI's Whisper) and Parsec for low-latency streaming, you’ve created a real-time transcription bridge that bypasses the hardware limitations of a smartphone.
- Parsec Requirements: The Home PC (Host) must have the Parsec Virtual USB/Mic Driver installed. This allows the PC to recognize the incoming audio stream from Phone 1 as a local microphone input.
Here is how it compares to other methods:
.
- Parsec Host: In Parsec settings on the PC, go to the Host tab and ensure "Microphone" is set to Enabled (Persistent). This ensures the virtual mic stays active even if the connection drops.
- Phone 1: Install the Parsec app and grant it "Record Audio" permissions.
Phase 2: Establishing the Connection - Connect Phone 1 to PC: Open Parsec on Phone 1 and connect to the Home PC.
- Activate Mic Passthrough: In the Parsec overlay on Phone 1, ensure the Microphone icon is toggled ON.
- Phone 2 Relay: Use a screen-sharing tool (like Discord, Zoom, or a native "Smart View" relay) to stream Phone 1's screen to Phone 2. Note: Since Phone 2 is only for visual monitoring, this minimizes network strain on the PC.
Phase 3: Accessing the Configuration Interface - On the PC (via the Phone 1 remote view), open Buzz.
- Navigate to the Live Recording tab.
- Input Selection: In the "Microphone" dropdown, select "Parsec Virtual Audio." This is the crucial link—it tells Buzz to "listen" to the audio coming from your phone in the lecture hall.
Phase 4: Applying and Verifying - Set the Model to Medium, Task to Transcribe, and ensure Faster-Whisper is the selected backend for VRAM efficiency.
- Click Record.
PC are connected to School wifi and The connection internet isnt good so im afraid of Network Bandwidth: For live transcription, latency is less critical than stability. Phone 1 requires a strong upload speed at the lecture hall (at least 5-10 Mbps) to ensure the audio stream doesn't "jitter," which can cause Whisper to hallucinate or miss words.
anyone got a prompt to feed ANY ai model to make it respond like a gen z/teen from today?
?? can you explain more, just giving us what you want isn't gonna help you need to say what its for, what precisely it is about.
Start a new chat session, try type something like "you are 20-year-old person" then follow with something in casual style. 
What is your PC GPU? And what would you use the voice changer for? The initials RVC (retrieval-based voice conversion) doesn't always mean realtime voice changer.
I've seen your message in chat, and well that's awkward.
if I need to start development with AI to build apps and websites what the best platform can I use it first
Look up in #1175430844685484042.
Microsoft Visual Studio and Python. 
do you use cloud its good or no
?

?
For simple code or big fix, I usually use Notepad. I'm not really the type of a person to go full on coding AI so.
made my own trainer cuz i didnt like applios trainer lol
For more experimental features not found in Applio RVC, there's Codename's RVC fork. 
i just made a faster trainer thats all
and i train on kaggle.
I'm not ready to do the same as you, I still prefer the more-friendly one.
Don't ask me with just word "what", I don't know what to answer.
Why VS for python..? Don't you mean VS code?
What am I supposed to say?
You made me stress myself trying to say something for real. Anyways, sometimes I use Microsoft Visual Studio for testing some Python environments. Python is a programming language that itself also has its executable (python.exe). I never have Visual Studio Code installed on my laptop.
"stress"? No need to take everything personal, this is a place for discussions after all and I don't mean anything bad
I thought you actually meant Code because it's q very common, free and lightweight tool for the job.
Python development in visual studio itself seems uncommon to me. Not even saying that it's bad or good, just wouldn't be my go-to personally
.
I always wondered what separating audios with 0.3 to 0.8 seconds of void at the beginning and the end of the audio clips actually does
WDYM?
i didn't think there's any reason to include leading/trailing silence
Best voice model for what? There's like Superman, Batman, etc
Try different RVC V2 Voice Models, what program are you using?
The person who gave me this tip said and I quote "If noise remains at the beginning or end of a sentence, or if the waveform is repeatedly sliced during processing, the model will attempt to generate audio whenever it encounters a "silent space" at these boundaries. Simply put, it does not tolerate "emptiness." In other words, unless the model is trained on patterns where the beginning and end of a sentence are consistently silent, it will produce a higher frequency of what are commonly known as "silence artifacts.""
I was not sure if it actually did what he said so I wanted to share the question
sounds weird to me, that's kinda what applio's silence injection is for
it adds some samples of silence so that the model learns it
that same person is also a manual labor freak so he taught me to do that manually I suppose
i mean, things like this are so simple that I think scripting it would be the way to go anyway
why do something simple as that manually when it can be just automated at preprocessing step
but as mentioned before, i don't think it's needed anyway, considering silent samples are already added at extraction step
I think those who want to squeeze every ounce of quality from the dataset would do that
there's no quality difference between adding silence manually and adding it via code, to be fair
it's just appending zeros from one or the other side
ah I didn't count in the fact that applio automatically puts silences in audio clips when processing
ayy you're too kind
hello, rvc means Retrieval-based-Voice-Conversion, not realtime voice changer
can you elaborate:
- your pc gpu
- your pc os
- the tutorial link you're using
Oh just one more thing, I'm clipping audios manually and clip durations are not the same
should I be worried about that? because there are clips from 2 to 10 seconds
كيف الحال
This server is English only.
Basim
Is there anything you would like to get help with?
do you use the slicing from Applio? because this will matter a lot
let's say I don't use applio's auto slicing
and I have variety of clips from 2 to 10 seconds of durations I processed
would that be a no no?
in that case, good question...
Short answer: not sure and I would love to hear opinion here from someone with better experience/deeper understanding.
Long answer:
from what I can see, during training, samples are fed into the networks in equal-sized segments anyway. That's 0.3-0.4s I think, depending on sample rate.
BUT, each sample is considered one step anyway, so the influence of a longer sample will be quite heavierr than of a shorter sample.
So I would expect that similar-but-not-equal sized clips would be fine, but when there's a 5x difference (like 2s and 10s), the longer ones would cause heavier weight adjustment.
In short, I think equal-sized samples should work better. The generalization should be better and gradient more stable, because each sample should be "equally important" to the model
But that's just my understanding based on how it is utilized during training. I didn't ever verify the actual effect, so it's just my theory for now
shucks
I was focusing on not cutting off the sentence structure mid way til 10 seconds(since it was my personal limit for a clip to have) and it's varying from 2 to 10 seconds now :/
eh I'll try feed about 16 to 30 minutes of these and let people know on here
For the record, my first couple trainings were done in a scenario like yours, where sample sizes were like 2-8s with some outliers being as long as 14s. I think I didn't use the auto-slicing then. The models were quite nice anyway.
But then, it was the beginning of my adventure with RVC so it might have just been "fine to a rookie me", and in reality perhaps they weren't so good. Dunno, i'd have to dig them out from my HDD
do i have to update okada
Yeah, I wondered about this multiple times actually. But eventually what I currently do is concatenate all samples together with ~100ms breaks between them and then just let auto-slicing cut them into (I think 3s long?) samples. This for sure cuts some words in the middle, but at the moment I don't really care that much to pay attention whether it has any bad influence
@low shard free money glitch here
ah, so a single big audio clip and let applio do the slicing
I'm going for that invididually processed multiple audio files so I wouldn't know about breaks between audio files
unless I mistaken what you've said
Is your PC GPU still NVIDIA GeForce RTX 2060 or you got a better one? You were getting help about using the voice changer last year.
like, I already have a hundred ish wav files(2s to 10s short clips) in a dataset folder because uh
nobody actually said anything about "you should combine them all together with breaks"
hello people, are there any working colabs or online spaces to use rvc with or w/o ui?
or you can pre-slice them into some equal-ish lengthed samples beforehand in any way you want. And then skip the applio slicer.
It's just that i think there shouldn't be a huge difference between clip lengths
same onee
understandable
im still using the fork but forgot its name
What do you use the voice changer for again?
Applio RVC.
I'll update once I finish processing this dataset and train it
Link please
just using it with my friends
in games sometimes too
Sure! Keep in mind that what I said about the influence of inconsistency between sample lengths is my expectation and I have no proof for it. So there might be something wrong in what I said. But it seems logical, judging by how it affects the training process.
But yeah, I'm learning too
Love to suck in as much knowledge as I can, but there's still lots of missing pieces for sure xD
will check, pls be patient
thx
Why ComfyUI?
never tried it, but you could check https://github.com/AIFSH/ComfyUI-RVC
Contribute to AIFSH/ComfyUI-RVC development by creating an account on GitHub.
I can use in on cloud. are there alt ways to run rvc on cloud?
yeah, there is https://docs.aihub.gg/rvc/cloud/applio-cloud/
Last update: March 24, 2026
what's your pc gpu and os? have you tried checking if you're good to run it locally first tho?
Thank you so much!
you're welcome
it wont suffice, leave it
can you elaborate:
- your pc gpu
- your pc os
- what are you trying to do: like AI Covers, E Girl Trolling / Catfishing, Roleplay with friends
- the tutorial link you're using
alright
2060 windows 10 roleplay
Dont remember the video
How to train model?
The instruction on how to train a "model" won't be as simple as your question. May I guess, are you looking for RVC? What is your PC GPU?
Rtx 3060
8GB or 12GB? Like, try elaborate some more, bud.
Only 8 GB
What are you tryna train? RVC or Stable Diffusion? You still haven't answer this.
hey just asking, what do you guys usually do when within a clip, there's a noticeable pause to the sentence from 0.2 to 0.5 seconds?
like, for example, "sorry, I mean...(pauses for 0.4sec) I mean, it's nothing shocking, I'd imagine."
what do you usually do for that paused part?
do you void it out? or do you just let that part be as it is?
even some music clips I do atm have pauses in between lyrics
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
im usin the deitteris fork
it's better to switch to wokada tg-develop or vonovox
what are those better in
they are more updated, it's better you check their pros and cons
which one is beer tg develop or vonovox
better*
that's actually where i used the smartcutter. It normalizes such pauses to ~100ms
I guess it can be done in various ways. Though probably automatic silence slicing from applio won't do a great job here
~100ms it is, thanks
it's better you check both their pros and cons and see what's better for you
i just need better voice quality
they offer different features, please just check and test both
vonovox might have more recent updates
how do i use voices??
How to replace person in a photo ?
Do you know a tool that let me mask
"Replace masked area with [describe person from file:1, e.g., man with short hair, smiling, same lighting and pose as original, muscular build, white tank top, party background, high detail, seamless blend
Check out Google Gemini's Nano Banana Pro. 
hey i need help with installing the voice changer on linux? (gentoo) i have a 3060 gpu
can you elaborate
- what are you trying to do: TTS, AI Covers, E Girl Trolling / Catfishing, Roleplay
- the issue
- the tutorial link
i'm guessing linux gentoo
its fine
it's all fine now after reading the docs?
yes
do i need to adjust the models based on my voice pitch (change it depending on my IRL voice pitch if i want it to sound like i saw in a video?)
If it's a female voice and you are a guy up the pitch until it sounds good, opposite for other way around
If you're a guy and use a guy voice pretty much always works well at 0
can i ask for help here for crashes?
or do i need to open a forum
You can ask here, I'm unsure on how to help though with any errors
But it does help opening a forum
nvm got it fixed
i found a model i really like but the quality sounds like a wakie talkie and i have no idea what to do about it. ive tried just about everything
hello
Can you give me the feature of sending a picture? I want to send a specific picture of settings to help me with it?
Who best voice changer
And why?
1- Tg Develop's W Okada Fork
2- Vonovox
3- Applio Realtime
4- Deiteris' W Okada Fork
if you have nvidia use vonovox, if you have AMD use wokada tg fork
if you have an Nvidia gpu yea
Okay thank you
yes i do but in the instruction i should use the vac cable lite ??,and yeah i use a virtual cable
it's not though, most likely was the model you used
yea u should use that one in the guide ^^
haven't tried it tbh, I dunno how good it is
alright,I will use it thanks ^^
Can i use tg fork?
I have rtx 4070
Sure, just download the two zip files from the guide that have 001 and pp2 at the end
I can't send the downloads ATM since I'm in the bathroom on my phone
Can you please send the pictures of the files and put them in one file and what to extract?
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
extract the first one then drag and drop the second one into the folder of the first one
then run mmvcserversio
I'm heading to sleep now so if you need help ping the helper role
what ai model did you try? Link it here
Link also the tutorial link you’re using and tell your pc gpu and os
can you elaborate:
- your pc gpu
- your pc os
- what are you trying to do: like AI Covers, E Girl Trolling / Catfishing, Roleplay with friends
- the tutorial link you're using
can you elaborate:
- your pc gpu
- your pc os
- what are you trying to do: like AI Covers, E Girl Trolling / Catfishing, Roleplay with friends
- the tutorial link you're using
Any pretrain suggestions for singers?
The ones I use tend to blend falsetto and chest voice
Are you trying to do e girl trolling / catfishing like the youtube tutorial?
Then why are you using a youtube tutorial for e girl trolling? That tutorial is outdated asf, you should delete everything
Hello, can you help me? I once came here to download a voice program, but in the end I couldn’t.
This is a General AI Discord Server, can you elaborate:
- your pc gpu
- your pc os
- what are you trying to do: like AI Covers, E Girl Trolling / Catfishing, Roleplay with friends
- the tutorial link you're using
rtx 3060
win 11
I want to change my voice in real time.
A friend showed me that he has a program. He said to download it from here or ask for help, but I couldn’t download it myself and didn’t ask for help at the time.
you shouldn’t reinstall the same thing, you should try wokada tg develop fork
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
He said you can basically change your voice to any voice.
are you like trying to do e girl trolling / catfishing or roleplay?
What program are you using? Send the tutorial or download link
I don’t have the program. I joined that Discord a while ago to download it, but I couldn’t figure it out. It’s been over a year. I remember he was choosing characters there and changing his voice.
And I don’t have any tutorial links either
oh alright, do you want to roleplay or e girl troll / catfish?
I just want to make a Zelenskyy voice.
roleplay
and prank my friends
Are you here?
Hi ill buy a 3090ti fe hpu 5950x cpu and 32gb ram 3600mhz (might upgrade for more ram soon) pc , am i screwed, will the ai not work/choppy/not real time??
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
try vonovoc or wokada tg develop
that’s an old message of a user who left
Will my specs be enough or im done for
Im guessing you’re on windows 10 or 1
Are you trying to: do ai covers, making models, e girl trolling / catfishing or roleplay?
Are you using any tutorial link?
Cover
Try look up earlier messages from this user, the guy still has not done with his PC specs choices even months later. 
So is it enough
yeah it’s way more than enough, but covers aren’t realtime so it seems you’re confused
But how long to wait til the voice comes out
I don't know, bud, I'm kind of really annoyed having to pinpoint the "final" answer for the same question. 
You didnt even answer months ago 😭
How about voice training will my specs be enough for clear voice
Don't expect too much from me, ok? I can only rate about your PC specs if it "enough", I'm not who to determine your final decision.
hardware doesn't affect the quality of the training, just the speed and memory limitations (e.g. how big of a batch size you can use)
anyway 3090ti will be more than fine
Speed and memory? So its the ram?
Any experience during training
I don't understand the question
Elaborate?
The voice training they say it takes multiple days to finish one training
It all depends
But no, generally no
I don't know the exact training performance of a 3090ti but the speed should be decent
but it will depend on some factors
mostly the amount of data you're gonna train with
Training time isn't that universal for all. Training one RVC voice model usually takes some hours, but then training time gets slower when audio dataset gets longer. Once you actually buy a new PC, you better try it out by yourself and let know how it goes. Sure, training a Stable Diffusion might take some days, but RVC is a different story.
What do you think about how I explain?
What pc specs do you have
Why?
Just asking what you staffs here use
I don't understand. What do you mean you want me to show my 2012 laptop specs?
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
Do you need help with realtime voice changer or something?
hello
im trying to roleplay for dnd and was wondering if this video is outdated
https://www.youtube.com/watch?v=SxdnGxicJOg&t=302s&pp=0gcJCdkKAYcqIYzv
What roleplay? 
im playing as a half orc
What is that roleplay about? What is your PC GPU?
me and my friends are going to start up a dnd session this weekend
how do i check that?
Windows, Linux or macOS? On Windows, open Task Manager or press Ctrl + Shift + Esc, go to Performance tab, spot if there any GPU 0 or GPU 1.
ok i am on windows
NVIDIA GeForce RTX 2060
do you need help?
Help about which?
What do you use the uncensored LLM for? Well, um I'm quite not into LLM model stuffs so.
i ned ittt
I know more about RVC and the voice changer related topics, so sorry if I just came here to ask without waiting for more context. 

ok
What is your laptop GPU? You can try search for "an LLM model that can run either CPU locally" if this might help. Let know here if you have problem trying to install, I might be able to pinpoint an issue.
How much is your laptop RAM? For more performance and full access to some larger models, I'd suggest an online option, unless your budget constraints.
8
8GB is at least usable, although 16GB or more is more ideal.
I see so many people wanting to use real-time on a crappy laptop, I feel bad for anyone with intel
Good thing I have a still useless Vega 7 then
that's not an RVC Model btw
that's whisper, an STT model, you can use the download icon to download specific files
I downaloded already and It work
I'm guessing then you're not using anything RVC related
Yep
¿Where can I find NSF-Hifigan?
Where do I put the code? Should I create a separate cell?
Hi, i want to train my own model. Quick question. Is it okay to throw a 2+ hours of data in single file in Applio or should i cut it by parts?
I'm not sure why you're trying to 'integrate' HiFi-GAN NSF, since that's what's used by default when training a model in RVC.
Hey guys, I would like to ask if anybody knows an AI tool that can create avatars, for free
just like this banned account "rabbigoldman" was doing (he is still on youtube and I'm not trying to advertise that channel, I just want to know an AI tool that can mimic the job for free)
Oh, I didn't know that, hahaha, thanks!
Hi, i want to train my own model. Quick question. Is it okay to throw a 2+ hours of data in single file in Applio or should i cut it by parts?
Anyone?
Hi which real time voice changer is the best for now? I have macOS
what is your gpu (Nvidia or AMD) and what do u plan on doing with it? just curious

Don't try to ignore the Sapphire bot here, the bot gives you how to asp properly. What is your PC GPU? Did you follow any tutorial or guide before? And what will you use the voice changer for?
Don't skip steps. What is your PC GPU? And what do you use the voice changer for?
If you were to get a "job" in real life though, and you know this server might not be where to find a job.
i have been messing around with running an ai locally on my own computer
i set up lm studio and loaded google gemma 3 4b and then connected it to a small python script so i can chat with it using the openai style api but everything runs locally
nothing goes to openai or any cloud servers it is just localhost and my own machine. mostly doing it to learn how this stuff works and honestly it is pretty fun
does anybody know how to make my
ai uncensored? and what more could i do to upgrade it
m gonna be trying dolphin‑2.9‑llama3‑8b‑gguf
ram expensive
yesss am useing the voice channger for cz am bored
am useing right
gpu

You didn't directly answer my questions.
This won't be a time to goof around. To check your PC GPU on Windows, open Task Manager, go to Performance tab, spot if there any GPU 0 or GPU 1, one of them could be a dedicated GPU. If you mean something else, let me know at least.
Why do I have to tell you every step like that? Click on GPU 1 to real its full name on the right panel.
is it Nvidia or AMD
amd
they may be, uhm idk how to put it in a nice way but you get it
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
first one is the voice changer for amd Wokada tg fork, second one is vac lite, download both, vac lite ill connect the voice changer and discord or games
https://github.com/tg-develop/voice-changer/releases/download/b2397/voice-changer-windows-amd64-dml.zip
is it safe??
completely safe yea, from this server
ok
It would be better to ask for more detail, I know this approach gonna be slower; otherwise we're gonna let the person to use the voice changer with an integrated GPU, which would be quite annoying to get along. 
Hello, I have a serious problem. I used to train a voice that I was able to use with Applio, but now with the new Applio, when I insert the .pth file, nothing happens. What is the solution?
No.
Stay here, please.
@hallow thistle
or can u support call
Sorry, but I don't have time to go on call. Please, stay here for help about the voice changer.
I don't know, I don't use the Dione. I'm more of running the batch file directly.
It's the same whether it's on Dione or locally.
I know how the voice changer works. Sorry if I come out as angry, but this is how I usually talk.
Also, you can send your screenshot to here now, I'm more of this way.
Where did you extract Applio RVC? And did you run either one as administrator? Because if you run as admin, that might not gonna work.
That's not the problem; I think the problem comes from my D_2100 and G_2100.pth files. How do I convert them into a single .pth file?
Oh no. Well, I actually don't know about how to combine D and G files into one, sorry.
the setup is easy just extract both, for vac lite run setup64 then install driver, for tg fok run mmvcserversio
tried sending in dms since you asked there but I couldn't


?
Okay, so the question is, to use my model on an RVC, should I choose D or G?
you can't do that pretty sure, just use the huggingface links and put them into the download section of applio, go to the custom pretrain downloader
I could've go faster, but the bud seemed to made things slower to progress. Not only he did leave the server for whatever reason.
It's inevitable; I created this model myself, and it worked perfectly on an older app that also worked. Now I've downloaded the new app, and the model downloads, but it's unusable. I think someone with expertise in this area is needed.
people need to learn patience
From what I know, D (Discriminator) and G (Generator) are checkpoint files used during training. If you get these two files instead of "one", you might did something wrong there. The actual voice model is something like killer-fish_400e_1600s.pth, not as separated D_.pth or G_.pth".
And if I enter these models to create a new model, would that work?
To get a single .pth file, there are possible ways. Either train the new one from start or use a script that could convert into a single usable model.
Using a script would be a faster approach. But did you remember which sample rate was your model trained?
Does your Mac use Intel or Apple Silicon (M2, M3) chip? And what would you use W-Okada for?
what's the best site to use ai models on? And how do I use a model I've downloaded?
Yes, I remember, I still have the datasets.
what kind of voice trolling? like darth vader or goku or smth
What does this mean?
Which model?

?
I meant it to be interesting, though I wasn't sure what to say.
wtf 😂
Okay, I'm going to redo my model. I have 4 minutes of dataset. What parameters should I use to get the best quality?
Because I just did 100 pockets, and it stinks, compared to my model from two years ago, which is excellent
I mostly use batch size of 4. Model audio quality gets good at around 300 - 450 epochs at least to me.
What is your PC GPU?
please reevaluate your life
yeah i saw that on the guide
by the way once I download it what do I do with the model? Just unzip it?
What’s your mac? What’s your pc os? Are you trying to do AI Covers, E Girl Trolling / Catfishing or Roleplay?
elaborate:
- your pc gpu
- your pc os
- what are you trying to do: TTS, Ai Covers, E Girl Trolling / Catfishing, Roleplay
- the tutorial link you’re using
I checked your older messages, and well that's awkward.
If you have an Nvidia GPU, then yeah, get the 17_11 vonovox beta and it should be better than the w-okada forks
you can check here for download
nah, there's just a bug with using index files in that release and that command fixes it. So if you're not gonna use index at all, you can skip it
TBH don't expect a massive quality increase, most of the sound quality is a result of the model after all
But it should definitely be more stable, and apart from that, there's lots of improvements in vonovox that are not present in w-okada forks
Better pitch/feature extraction, improved stitching, volume handling etc.
So in those terms it should be better
Whether it's audible will depend
M4, I want to troll my friends on discord voice chat with some male voice
That's a new
WDYM by "install"? Offline inference on audio files or realtime?
For training or merging models Applio is still the right path
is there any alternate method to obtaining okada, the hugging face redirect is currently not working
Most recent "Applio RVC" version generally supports NVIDIA GeForce RTX 50 series. Download Applio RVC from Hugging Face. https://huggingface.co/IAHispano/Applio/tree/main/Compiled/Windows
Is your PC GPU still NVIDIA GeForce RTX 3060 or you got a newer one?
still the same
What do you use the voice changer for?
roleplay for dnd campaigns mostly
There's Vonovox. This voice changer is known to give better audio quality than many W-Okada versions.
and its rvc?
Like W-Okada, Vonovox uses RVC voice models.
is the setup any more difficult than okada?
i can give it a look for sure
Download Vonovox. https://huggingface.co/dr87/vonovox/resolve/main/Vonovox_beta_17_11.zip
where is maeko (Ajthefunky) rvc
goat tysm
While using Dione Launcher might be easier for most people, the downside is that it's harder to know what's going with terminal when you get errors. Some people would directly run the batch file instead.
Why male voice? Are voice models of famous characters not that interesting?
i want to unban me on weights discord
are you like looking to e boy troll / catfish?
ai hub isn't by weights anymore, check if there's a modmail or anything for weights
I wanted to know how many channels are posting AI Covers of many famous songs by AI, remaking them with different voices? Including a channel that does AI Covers with Ariana Grande's voice that sounds really good!
Until now, I only know Suno, which many say does well, but there's that copyright thing there that nowadays, you can't do anything, unless it's from a very unknown artist.
I wanted to make a website using ai but the ai's then to limit messages. I used claude and it gave me a good code, but there were some errors and things I didn't like but I couldn't make the changes since the messages keep getting limited, and this is taking alot of time. So what should I do now?
any advise on how to start fress....a newbie here 😁
What do you use the voice changer for? And are there even two different AMD Ryzen CPU models in the same PC system?
guide for this?
Which voice model would you like to use?
What is this? 
He's back @low shard 
Something bad happened?
If you look at his older messages here he's a little difficult to talk to and deal with, maybe he has changed
This is super outdated, since you have a 5080 you should use Vonovox
2 months old? Anything on yt about Wokada is outdated as they don't use the newest stuff like Vonovox or Wokada tg fork
Btw what do you plan on using this for, just curious ^^
That virtual audio cable can cause issues on windows but if it's working fine for you you don't have to switch, it's just recommended over VB cable (the one you have now)
Alrighty
I'm happy to see someone using the voice changer normally and not what most of the screenshots in hall of fame showcase
Hi I wanna help where I can find good text to speech ai same as applio ?
U could just use Applio still, I'm unsure of other ones that have free tts options
It's not working with me
When I use model it doesn't show up the voice or something
Applio is the best one but having this problem I can't even use it
That's odd, have you tried it both locally and on a browser site like Google colab or Kaggle?
I did not actually but you know what's odd when I do text to speech like convert it give me the old model voice I did can it be from installation or something ?
I wonder if I'm the only one who have this problem
That sounds strange, you could make a help forum in here to document your issue and see if someone with more knowledge could fix your issue
https://discord.com/channels/1159260121998827560/1192011222023950368
hey call me crazy, but I already processed all my audio clips into these and I want to ask
should I separate each individual sentence as an audio file because I can do that, or should I just put this single long audio file, like the screenshot I took, into the Applio?
I personally just use a singular audio file since applio already cuts it up automatically into 30s clips
Saves room on my pc
okay I agree to that
but hear me out
I already spaced out like the screenshot above
and should I still let Applio to cut it automatically?
and I want to ask is applio cutting into 30s clips is by default and can be adjusted or
Pretty sure you can adjust that to where it doesn't but I'm unsure on what exact setting you'd need to press for that tbh
Ping the helper role to see if anyone like Namari would know, they're pretty smart
I'm just kinda someone who knows how to use Applio and the voice changer stuff as well
Not super advanced knowledge:p
I have question in tts what voices do you use ? Because someone tell me the tts voices is from Microsoft and they don't work
I don't personally use the tts function often but you just use any model and type something on the tts section and it should sound like the model you're using
thanks for the tip
<@&1159293204038955078> any response would be greatful to this question idk what to do with the dataset I processed which is processed rather unusually (pics are right above)
Why won't Applio open? I'm going crazy!
no
if it's already cut you don't have to cut it
yes
though the default is enough
i forgot what it was
wich program and model are u using?
can i see a screen maybe?
i also have amd so i'll try
ok so lower the extra all the way down
and put index to 0.6
then put chunk to the max
also the tresh try to put 0.6
yeah but for now try with the longest then adjust urself
and keep the one that u prefer
i mean near the middle
a little to the right
this is so old
I got u 1 sec
@round fog idk if this is catfish
you're not one of these guys are you?
we are just making sure
voice models like the one you are using here are commonly used for catfishing/scamming
what tutorial have you watched?
exactly
ewwww
"ai girl voice changer" that is not what it's for 😭
I hate any yt videos like that
bro
just use a normal voice like a game or show character
don't use egirl things they're gross
@cosmic spire @dusty rampart can you guys check please?
and they have those weird pictures too
that video is very outdated
that's why what u have is also old
anything on yt about realtime voice changers are outdated since they don't keep up with current stuff
we are against
the catfishing
so if we see even the smallest thing we have to make sure
it's not hate against you
thank you for being normal ❤️
and also before anything there are some guides
applio is a similar program but updated
and it has more functionalities
sorry for my englis
italian here
for AMD I recommend Wokada tg fork tho Applio is a option I'm just unfamiliar with it voice changer wise
and also
ciao
you can update the drivers, if you have amd adrenaline
then also asio4all it's not bad
hahahaha veramente, comunque se vuoi parlare andiamo su chat che qua si fa assistenza
oki ma parliamo in inglese altrimenti ci cazziano
really, i have applio but if you say that wokada works well then i'll try it
cause i have some latency issues
nah
that's mainly for AMD, Vonovox is best for Nvidia if you have that
nono, since you have AMD you have Wokada TG and Applio's realtime
search the latest version online then install it, then in the window chose only update drivers and ur good
cause i have a pc for rendering, i mean threadripper and 6900xt
stupid bot broke
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project.
A Realtime Voice Changer with similar performance to Vonovox & Wokada Tg-Develop Fork, with extra features.
Deiteris' fork (modified version) of wokada that doesn't get updates anymore.
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
there it is
just the developer name of who made that version of wokada
late! go to time out
hahahahaha
here's the two downloads you'll need ^^
https://github.com/tg-develop/voice-changer/releases/download/b2397/voice-changer-windows-amd64-dml.zip
first is the voice changer second is a virtual audio cable to connect it to games and discord
you can find any voice you like here, use the most recent version of any you look for ^^
https://discord.com/channels/1159260121998827560/1175430844685484042
or just go here https://voice-models.com/
the voice model that u use does a LOT of the work to be fluid
ye that is also an option
I try
it's very setting dependent and software as well
worm may i ask your friend request maybe to ask for help in the future?
for amd ONNX is needed yea
voice models aren't for a specific type of gpu
they work on all
i need 3 latina, 5 egirl, 2 mommy, 1 asmr, 6 indian and 3 french girl voices and i want them NOW
hell yeah
just delete the old one so it doesn't interfere, the new program is completely separate from the one u had
yeah but just started too
use it if you wanna add cool voice effects to wokada, but the ai voices on voicemod are doo doo
that's EXACTLY how i started
oh hell naaawww
if you buy it, do the lifetime one and then just keep it for the soundboard lol
One message removed from a suspended account.
tbh my boyfriend is the one who knows all and he set me up a lot
paid gives infinite space tho lol
spamming it is for losers, use it for comedy and perfectly timing jokes
W partner
if you hit free limits, you can either try another tool or pay or run AI locally on your beefy PC
ai is a vast field
I tried running it locally but I think it isn't as effective, and I don't know how I can upload file/image in it then
huh?
elaborate:
- your pc os
- what are you trying to do: TTS, AI Covers, E Girl Trolling / Catfishing or Roleplay
- the tutorial link
Last update: March 23, 2026
im guessing the guy is already out here
are you sure you don't have any gpu at all?
Check:
- your pc gpu
- your pc os
- what are you trying to do: TTS, AI Covers, E Girl Trolling / Catfishing or Roleplay
ok scusami
che bello italianiii
there's barely any italian here, and i'm the only one that's also a staff currently
there were a bit more before the 2023 rvc ai cover hype died
I have 2 or three ai covers on my yt channel
most are over a year ago tho
hey does anyone know something like Zorq AI's motion control but free or client-side? such as facefusion?, but id like something that doesnt replace only the face
is there any way to use with intel arc? i have Intel(R) Arc(TM) B580 Graphics
thanks a lot so appolio cut them automatically and then stitch them together in the end like my screenshot
Which program?
RVC.

