#How to hear myself
1 messages · Page 1 of 1 (latest)
On your voice changer, chunk number is set way too high, making the audio to delay approximately 2.1 seconds (approximately 2100 ms), so try reduce chunk value down to 130 ms. Set Extra value to 2.7 s. Check if your "monitor" output is your speakers or headphones to hear the program.
What about your full screenshot?
The program is working, look up performance stat at top left corner screen. What do you mean you can't hear the program?
it judt doesnt work, how can i show you
is it supposed to be working by the stats?
i wwitched to server and now i can listen but it hss bad quality
Can't help with an E-gurf voice model, but try another anime voice model from #1175430844685484042.
Set F0 to rmvpe_onnx, not something else.
it works in server kinda, client doesnt tho, whats the difference about server and client
do you got any advice to make it sound good or should i test it myself
it sounds a bit robotic and i got echo s little bit
Or try this newer W-Okada fork. https://github.com/tg-develop/voice-changer/releases/download/b2397/voice-changer-windows-amd64-dml.zip
can i keep both versions or do i got to delete the current one
Extract a newer one to a different folder, while also keep the older one if the newer one fails just in case.
I'll test it in some hours, if it goes good ill just leave this as solved
By the way, does a better mic improve the voice quality?
That's one myth about the voice changer. A better microphone doesn't improve the generated voice audio quality at the post-result, though a better-working microphone can sometimes suppress a bit of background noise that coming to microphone, regardless of microphone price.
So it's more important to look for noise suppression microphones with an alright quality instead of a mic with insane quality?
Though last ones may also got nice noise supression
A physical microphone itself doesn't have its own noise suppression feature, although the noise suppression/echo cancellation is often done through software-level settings. Some slightly expensive dedicated microphones have their own AI noise removing processing unit built-in, but as what I said these devices will be more expensive than the non-AI ones.
Do you got any advice for people with noisy mics? I feel like my mic doesn't have good noise supression
Or how should I manage this issue
By the way, the audio sounding robotic sometimes has to happen with the voice model itself, even if extra value is set to recommended 2.7 s on W-Okada fork. While the mentioned voice model doesn't always mean it was badly trained, I once heard that "low" dataset audio in RVC voice model training might be the issue.
Oh maybe, but if that's the issue I'll end up using others, right now I just downloaded one for testing, but ty, I'll try the newer version in some hours though client probably still won't work
Is server better than client or what's the difference?
Audio modes in voice changer: "Client" is simpler, lets you use the voice changer's noise supression and echo cancellation options, although generally has the higher audio latency because it mainly uses older "MME" audio API. "Server" one is more complex but flexible, lets you pick any sample rate, audio API (MME, WASAPI, ASIO), although sometimes the program fails if set to WASAPI and a mismatched sample rate.
When you set audio mode to "server", these noise/echo suppression options will grey out and unavailable. This one is a known quirk in every W-Okada version.
should i add index and path if a model has both
The index changes how the voice's speech sounds and how much this blends with your own voice.
It is optional
index files tend to be large, and, if you're going to use them you may need to adjust the chunk size to compensate, since it requires more time to process the voice with it enabled.
Only the pth file is required.
it runs good but now when i talk it cuts off last second of the phrase, or sounds robotic in the end
a lot of the time thats happening
nvm its robotic
If you changed the extra, chunk size or crossfade, you should swap your processor once to cpu, and then back to gpu0 to fix a bug
that should help with lagginess, choppiness and huge amount of processor usage
for robotic sounds... idk, it could be the model, it could be a filter, it could be software
you might be able to fix some of it by speaking more loudly into the microphone or adjusting the input volume
you can try to fix some of it by adjusting extra or crossfade, but don't forget to swap processor (cpu and then back to gpu)
it might help to lower protect to 0.30 or lower
but do also check your microphone settings; it's possible its caused by some setting.

There are some workarounds, simply, if you still care about 100% perfect audio quality. Click "stop server" at first, go to advanced settings, set "Crossfade overlap" to 0.15 and set "Force fp32".
Ty I'll try this, by the way does internet connection matter at all?
It is true that I didn't do this but in the earlier version I used it had a warning of this bug and in the newer one I didn't have one so I thought this bug didn't apply anymore
When you run W-Okada voice changer locally, internet connection is** required** to download some files, meanwhile most actual functions (like converting your voice in realtime) generally don't need internet. The online cloud options (like Colab and Kaggle) however require internet to all function, not just the voice changer that runs within the service.
As much as I implied, the internet has nothing to do with the audio quality. Aside from simple settings and an RVC voice model, how do you know if something else has to do with the audio quality on voice changer?
Do you mean if there's anything I believe may be causing bad quality?
If so, maybe my mic in my headphones are kinda mid
Should I record and send it here?
Like an .mp3, .wav or wtv
Still believing like that?
Wdym
Do you still think your microphone is the issue in all of this? We've had a discussion earlier.
Yes I remember I thought it kinda influenced the input but if you say so I'll stop focusing in my mic, though should I send a test audio? Reading whatever?
Maybe I'm tripping and it's not that bad but im not sure I'm not an expert
It could help see what exactly you mean with robotic noise.
The voice changer should only be reacting to voice though, not to noise. I'm not sure how your microphone by itself would be causing robotic sound. It could be caused by background noises instead and I noticed some lower quality models might cause robotic like sound to be put out with certain sounds, or at random intervals when using the index.
Some people speak really softly; whispering I mean. Some models cannot handle whispers and then produce robotic like sound in its place. That is why I mentioned that speaking louder might help.
a better microphone improves the quality of the model due to the input being resampled to 16k
if your microphone sounds muddy and bad at that sr then the model will struggle and have different problems like bad pronunciation, robotic sound, etc
but if your microphone is clean and high quality then the model (more specific, the embedder) is going to have a more easy job translating your voice to the model's voice, giving better results
some models are just bad and robotic tho, that cannot be fixed
ah and the index file it's just a file (xD) that stores the accent found in the dataset
index at 0 makes the model to use the accent/pronunciation of the source audio (your voice)
if you set a value higher than that the model is going blend the index file pronunciation in the result
but in realtime ive heard most of the time screws up pronunciation, it's more useful in non realtime infer
my bad for that slow reply, heres a demostration, dont mind what i read it was a random newspaper, but you will get quickly what happens
It does sound like an issue with the model; it for some reason has trouble determining the vocal pitch seemingly.
Can you try a different voice model and see if you still get this odd effect?
This is the only girl voice I've seen that is good
I feel like I found the solution slightly atleast
I got to talk more like clear and calmly
If I speak too quick or too high or low it glitches
And slightly loud
Not too loud just not whispering or speaking unclearly
One tip would be; try to sit in front of the microphone, with your mouth I mean
and figure out from what angle the sound enters it best; some microphones have an odd angle. The backside will be less loud than the front.
You should be able to be heard clearly with about a fist distance. (2 to 6 inches)
With a headset, keep the microphone very close to your cheek, at the corner of your mouth.
Also, try adjusting the microphone volume or gain if you can (on the hardware), if not you can increase the input volume on the voice changer, or use effects like the audio compressor to force it to take your voice in louder.
Personally I prefer to use OBS's filters, which do the same thing.
Voice Effects are basically adjustments to the audio received or send.
Gain for example, makes you louder (or less loud if you do negative decibells)
Limiter prevents your loudness from going over a certain value
A compressor effect is basically a conditional effect, which causes audio to be pushed less loud or louder if audio is detected at a specific decibell level.
In the past these types of 'effects' were only done by people through expensive hardware, or through expensive VST plugins.
or by programmers, if they know their way around complicated tools that adjust the audio driver directly (well almost)
Are these placed like along with the chunk etc settings or where are they located visually
Within the menu
This is where the "audio effects" located. Click on "+" button to reveal "add audio effect" pop up.
where shoukd i put this output or input
Output.
is there any recommended settings for compressor
While I know how "compressor" effect works in audio engineering, I'm not sure how to explain about this one.
Click stop server, scroll down, and you'll see a red trash can icon on that FX.
its like merging with tge ui
If you resize your browser too small, the voice changer UI would look too squeezed.
oh yh fixed now
is this a common issue or why cant the ai say hello or some specific words are like harder
This thread gonna be a long one, one issue initially solved, then another one going in and so on. 
my bad
Depends on the model and how it was trained. Some models have certain tones louder than others even if you clearly speak the same volume. You can fix this with the compressor effect, but don't forget to add gain since the compressor also lowers the output volume slightly.
If you feel like you sound too soft, you can also use an expander, but be a bit careful where you place the threshold when using both of these. Your audio only needs to be somewhat more equal, not completely equalized. If everything has the same loudness than you won't sound natural.
EQ can help with things as well. Humans have by default higher midtones than low and high ones; you can make it close to eachother by adding or removing some volume (decibel)
A good Equalizer and Compressor setting wll make you sound like as if talking in a studio, but you'll need to find the correct settings for each voice. Unfortunately Audio Effects are shared, which is a flaw in tg-develop's design.
the settings depend on the voice model
the default settings can be okay, but you may want to make the threshold smaller, reduce the attack and increase the release values so that it acts quicker and compresses more.
Try make your own thread in #1192011222023950368.
sorry slow answer do u mind taking this to dms? i dont want the thread to be too long
So does this mean the initial issue finally solved?