#Realistic text-to-speech open source libraries
2 messages · Page 1 of 1 (latest)
There are a ton of TTS models available. Look up FastPitch, Flowtron, FastSpeech2, Grad-TTS, Mixer-TTS, TalkNet. Many of these you can find on Nvidia repos:
https://github.com/nvidia-riva
https://github.com/NVIDIA/NeMo
https://github.com/NVIDIA/flowtron
https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis
https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS
Each of these has been fine tuned on LJSpeech so there's a pre-trained model to download. What I would do in your position is create several samples of yourself to fine-tune the model on and resume the training now using only your audio data.
NeMo: a toolkit for conversational AI. Contribute to NVIDIA/NeMo development by creating an account on GitHub.
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer - GitHub - NVIDIA/flowtron: Flowtron is an auto-regre...
Deep Learning Examples. Contribute to NVIDIA/DeepLearningExamples development by creating an account on GitHub.