#Limitations of w-okada/realtime conversion

1 messages · Page 1 of 1 (latest)

quaint needle
#

Hi, I’ve noticed that models consistently struggle with certain sounds. In my tests picking any random model and just saying a drawn out “heyyyyy” will almost guaranteed break the conversion, even in models that perform excellent otherwise. The result is usually some garbled “heyyuuuuaaaaaaaeeeee”.

I’m wondering if this is a limitation of RVC2, or just lack of training data with drawn out vowels?

I can reproduce this with any model I’ve tried so far, so I don’t think it’s me, but I’d happily try any suggestions.

zenith lotusBOT
#

Ayo? @quaint needle level 1 !!! lfg

median escarp
#

Might be a limitation of realtime conversion OR just the lack of training data with that type of sound.

smoky salmon
#

Some models don't like it if you do a single sounds for long, ex it's fine if it's just "hey" but when you say "heeeeeyyy..." It will break

#

Not all models tho

wicked igloo
#

is it stuttering or slow?

quaint needle
#

Not stuttering or slow, no.
I guess if not all models do this then it’s a matter of training data, but I legit didn’t find a single one without this issue.

wicked igloo
#

you know the thing were it delays the voice changer?

#

that could be that?? that need to increase or decrease?

quaint needle
#

Nono, I’m quite confident that everything is set up correctly

99% of the voice conversion is fine, it just struggles for very specific input sounds. In my case just a drawn out “heeeeeyyy” will almost guaranteed not be converted as you’d expect.

Maybe Ill try to clone my own voice to see if it’s just a lack of training or really a realtime RVC2 problem

smoky salmon
#

Try my nekrolina model it, it never did that on that model for me so ye

quaint needle
#

will do in a bit, thanks for the tip!

smoky salmon
#

I really think that's just a model problem

#

Also screenshot your settings just in case it's a setting problem

quaint needle
#

Haven't tried yours yet. Here are my settings (v1.5.3.15 onnxgpu-cuda)

#

Happens with Nekrolina too. Here's a small example with a bit of normal speech to proof it works fine otherwise

#

Obviously it's a bit exaggerated there to really show it. Sometimes it's way worse than in this clip

zenith lotusBOT
#

Ayo? @quaint needle level 2 !!! lfg

quaint needle
#

I have never tried the RVC client for realtime conversion before, but I will try and report back. Definitively going to try all your other tips too! Thank you

#

ahhh ok, got it

quaint needle
#
  • Setting mic sample rate to model: ❌
  • Balancing gain: ❌
  • Switching mic: ❌
  • Reducing background noise (manually or Sonar ClearCast AI): ❌
  • Trying RVC: ✅ Works flawlessly. Not a single issue. Absolute perfect. Also: No idea what to do with this information 😅
#

Yep, it created a new input device that I then used in Okada

#

So just a limitation of realtime RVC then? Do you think it's worth to give the realtime RVC client a try?

#

Okay, I guess at this point it's worth a try just to rule out all possibilities. If I find something I'll update here, but thank you for all the help in any case 🙂

quaint needle
#

Hey, I just wanted to leave a final comment here if anyone is curious:

The realtime RVC client didn't help, but changing the "Extra" setting in w-okada had a huge impact. Lowering it to 8k or even 4k made the problem described here much worse. Increasing it to 65k or 131k almost solved it completely.

Most guides and advice I've seen here says to not go past 16k, but in my case that actually solved my issue.