#Realistic text-to-speech open source libraries

2 messages · Page 1 of 1 (latest)

sacred mirage
#

Hey all! I would appreciate some suggestions for:

1- The best text-to-speech open source library at the moment that could get a quality close to what PLAY.HT has.
2- Could be trained to speak in your own voice.

Thanks.

prime zephyr
#

There are a ton of TTS models available. Look up FastPitch, Flowtron, FastSpeech2, Grad-TTS, Mixer-TTS, TalkNet. Many of these you can find on Nvidia repos:
https://github.com/nvidia-riva
https://github.com/NVIDIA/NeMo
https://github.com/NVIDIA/flowtron
https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis
https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS

Each of these has been fine tuned on LJSpeech so there's a pre-trained model to download. What I would do in your position is create several samples of yourself to fine-tune the model on and resume the training now using only your audio data.

GitHub

NVIDIA Riva has 8 repositories available. Follow their code on GitHub.

GitHub

NeMo: a toolkit for conversational AI. Contribute to NVIDIA/NeMo development by creating an account on GitHub.

GitHub

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer - GitHub - NVIDIA/flowtron: Flowtron is an auto-regre...

GitHub

Deep Learning Examples. Contribute to NVIDIA/DeepLearningExamples development by creating an account on GitHub.

GitHub

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. - Speech-Backbones/Grad-TTS at main · huawei-noah/Speech-Backbones