#Voice Question
1 messages · Page 1 of 1 (latest)
there's a "questions" tag, use that next time please 🙏
It would make more sense to train a vocoder like HifiGAN or BigVGAN on neuro's current voice then run whatever TTS Vedal plans to use for the expressiveness through it. Whether or not the TTS local will determine the steps in the pipeline such as unnecessary conversions from wav to mel and vice versa.
It would be better if vedal adjust the voice bit by bit every neuro stream. That way viewers won't notice a thing, then eventually they can adjust to the new voice without noticing a thing. It will probably take atleast a month of stream
unfortunately it is not this simple 😔
can neuro use the new voice for friday
nope, sorry
ok
IMHO this can definitely do it https://github.com/152334H/tortoise-tts-fast, as suggested above it uses BigVGAN for expressiveness, however for singing this other one works well since you will pre-record it before hand:https://github.com/34j/so-vits-svc-fork
tutel tts
i looked at tortoise and it seems that it only outputs at 22050hz
probably because of bigvgan
wait what.. you can hear above 20,000 hz?
https://www.youtube.com/watch?v=PAsMlDptjx8
20Hz to 20,000Hz is commonly considered to be the range of human hearing.
We created this track to help car audio fanatics tune and test their stereo systems.
Sonic Electronix always strives to be the premiere online shopping destination for car electronics ...
no
sample rate of 22050hz
thats like half of the usual 44.1khz or 48
you can change it here: reference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]
is it that easy?
pretty sure the input and output size of the model need to change as well
cuz the tensor size changes
but its tts, it cant get any better. its a waste
yeah you need a model to train it with, thats why for tts its a waste?
yeah i found same voice in azure, you can make it sing with SSML tags
but for singing its better to use https://github.com/34j/so-vits-svc-fork has the best vocals cloning. coz you can record the performance anyway,, audience will never know the difference
Interesting
not the current opensource SOTA