#Voice Question

1 messages · Page 1 of 1 (latest)

surreal pewter
#

What if a voice was trained off both the new and old voice sets? I also wonder what if you do ratio's of each to tune it? I imagine it would end up sounding like some eldritch abomination but it would be interesting to test. It may be surprising with good elements of both.

reef nest
#

there's a "questions" tag, use that next time please 🙏

tawdry herald
#

It would make more sense to train a vocoder like HifiGAN or BigVGAN on neuro's current voice then run whatever TTS Vedal plans to use for the expressiveness through it. Whether or not the TTS local will determine the steps in the pipeline such as unnecessary conversions from wav to mel and vice versa.

hot quiver
#

It would be better if vedal adjust the voice bit by bit every neuro stream. That way viewers won't notice a thing, then eventually they can adjust to the new voice without noticing a thing. It will probably take atleast a month of stream

hidden storm
#

unfortunately it is not this simple 😔

vagrant socket
#

can neuro use the new voice for friday

hidden storm
vagrant socket
quick pecan
#

IMHO this can definitely do it https://github.com/152334H/tortoise-tts-fast, as suggested above it uses BigVGAN for expressiveness, however for singing this other one works well since you will pre-record it before hand:https://github.com/34j/so-vits-svc-fork

GitHub

so-vits-svc fork with realtime support, improved interface and more features. - GitHub - 34j/so-vits-svc-fork: so-vits-svc fork with realtime support, improved interface and more features.

tawdry herald
#

tutel tts

tawdry herald
#

probably because of bigvgan

quick pecan
# tawdry herald i looked at tortoise and it seems that it only outputs at 22050hz

wait what.. you can hear above 20,000 hz? vedalHappi vedalHappi vedalHappi https://www.youtube.com/watch?v=PAsMlDptjx8

20Hz to 20,000Hz is commonly considered to be the range of human hearing.

We created this track to help car audio fanatics tune and test their stereo systems.


Sonic Electronix always strives to be the premiere online shopping destination for car electronics ...

▶ Play video
tawdry herald
#

sample rate of 22050hz

#

thats like half of the usual 44.1khz or 48

quick pecan
#

you can change it here: reference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]

tawdry herald
#

is it that easy?

#

pretty sure the input and output size of the model need to change as well

#

cuz the tensor size changes

quick pecan
#

but its tts, it cant get any better. its a waste

quick pecan
tawdry herald
#

Not really

#

I mean neuro can generate both dictionary and audio

quick pecan
#

yeah i found same voice in azure, you can make it sing with SSML tags

quick pecan
# quick pecan IMHO this can definitely do it https://github.com/152334H/tortoise-tts-fast, as ...

but for singing its better to use https://github.com/34j/so-vits-svc-fork has the best vocals cloning. coz you can record the performance anyway,, audience will never know the difference

GitHub

so-vits-svc fork with realtime support, improved interface and more features. - GitHub - 34j/so-vits-svc-fork: so-vits-svc fork with realtime support, improved interface and more features.

maiden marten
#

Interesting

tawdry herald