#How Realtime Voice changer W-okada

1 messages · Page 1 of 1 (latest)

frail forge
#

Hey, like 6 months ago I used W-okada Voice changer with some models from this discord.

I got a new gpu (4090) and wanted to ask if realtime translation is possible now? Or with the 5090?

Because in the past I used the 4080 super and the voice changer worked well, but it took like 3 seconds after I finished my sentence for it to get said, which just is to long and makes conversations impossible.

So my goal is to achieve real time translation during games or even if I am just on desktop, but I don’t know if it’s an hardware or a software issue.

#

I didn’t use any specific guide but I know that If it’s realtime it’s sounds awful but if I increase chunks and such it sounds better but takes longer. But with current tech it should be possible to achieve real time voice changing?

solar idol
#

RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime. There also updated forks with extra features like Applio.

Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)

Vonovox = Another Realtime Voice Changer based on RVC, with similar quality and performance to wokada deiteris fork but other perks

#

I'm guessing you're on windows

#

what do you want to do? because there's another program called tiger whisper ui for realtime translations

frail forge
#

Roleplaying with friends.

Like in the discord fooling around with friends, in imaginary scenarios.

So I would be gaming while using the voice changer.

And my goal is to be able to have conversations with friends, without waiting times. Yes I am on windows

Is vonovox better? Or what are the other perks?

#

Thanks in advance 🙏

solar idol
#

be sure to not use that one

solar idol
#

-realtime

cinder jungleBOT
# solar idol -realtime
💻 Local Realtime RVC

Guides for Programs that use RVC Models in Realtime for Calls/Games

• Wokada Deiteris Fork

Most suggested WebUI with the best general support for many platforms. GUIDE

• Vonovox

A Realtime Voice Changer with similar performance to Wokada Deiteris Fork, with extra features, but supported only for Nvidia GPUs on Windows. and without cloud options GUIDE

⚔️ Wokada Deiteris Fork vs Vonovox

For Windows Nvidia, Both Wokada Deiteris fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Deiteris Fork Pros&Cons & Vonovox Pros&Cons

⛔ Outdated/Discouraged

These options are not recommended for use.

• Original Wokada

Not suggested, older versions in youtube tuts are even way worse. GUIDE

• RVC GUI Mainline Realtime

The program is worse compared to the ones above, and much less updated. GUIDE

solar idol
#

get either vonovox or wokada deiteris fork

frail forge
#

thx

solar idol
frail forge
#

I will do that

solar idol
frail forge
#

works great! I only have 217ms

#

outside of games, with games and other process I need to test, but I guess it would be around 300-400ms

solar idol
frail forge
#

Games would be like gta5, or Fps not cyberpunk2077 or something

solar idol
# frail forge sure??? Even on a 4090?

your gpu usually prioritize the game since it's actually rendering something on the screen unlike the ai program, that's why we suggest that usually for everyone for having the lowest delay

#

i mean you can try in games and let me know, i don't remember someone playing cyberpunk 2077 in 4k and having low delay soo i dunno also bc i don't have your same gpu

solar idol
frail forge
#

A german male model which I am using, because there arent a lot

solar idol
# frail forge A german male model which I am using, because there arent a lot

on wokada deiteris fork, you can **optionally **use more advanced settings for benefits:

  • Advanced Settings -> Force FP32 mode: on (THIS IS OFF BY DEFAULT! Turning this on improves stability. Increases VRAM usage by 200 MB)
  • Advanced Settings -> Disable JIT compilation: off for faster loading speed of the program, on for slightly better performance (10-15 ms) for Nvidia only)
  • Advanced Settings -> Crossfade Lenght: Controls how smoothly the AI stitches different processed parts "chunks" of your voice back together. 0.1 for fastest voice, 0.15 for improved quality but increases delay by ~50 ms
  • Reduce the delay on Windows via the Wasapi / Asio Guide
#

the only setting you have to modify per model is the pitch, which you gotta play with

frail forge
#

English model which everyone uses, I am just testing the english and german language and with basic models.

The only thing that kinda annoys me is that it sounds a bit off? Like for my german model thingy the guy sounds like deep? While I sound like a child version of him and if I decrese the pitch by one it gets really deep and by +1 one I sound like a annoyging child

frail forge
solar idol
frail forge
#

everything else is great and the delay is really low which makes me happy, in the earlier days it was awfull

solar idol
#

play with the pitch and also check for other models

solar idol
frail forge
#

to recommend?

solar idol
#

just know that RVC got limitations, it can't always do super realistically non speech sounds like not all type of laughs or screams

frail forge
#

okay thx, last question, what is formant shift?? I don`t get it

#

its the same as pitch? Or no?

frail forge
solar idol
#

lemme do an example

#

changing the pitch is like using a different piano key

#

changing the formant shift would like using the same key but with pianos made out of different materials, you're changing only the timbre

frail forge
#

interessting, thx

solar idol
frail forge
#

yes, thank you