Realistic text-to-speech open source libraries | Learn AI Together | Page 1

sacred mirage · 2022-12-05T22:53:08.381Z

Hey all! I would appreciate some suggestions for: 1- The best text-to-speech open source library at the moment that could get a quality close to what PLAY.HT has. 2- Could be trained to speak in your own voice. Thanks.

There are a ton of TTS models available. Look up FastPitch, Flowtron, FastSpeech2, Grad-TTS, Mixer-TTS, TalkNet. Many of these you can find on Nvidia repos:
https://github.com/nvidia-riva
https://github.com/NVIDIA/NeMo
https://github.com/NVIDIA/flowtron
https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis
https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS

Each of these has been fine tuned on LJSpeech so there's a pre-trained model to download. What I would do in your position is create several samples of yourself to fine-tune the model on and resume the training now using only your audio data.

GitHub

NVIDIA Riva

NVIDIA Riva has 8 repositories available. Follow their code on GitHub.

GitHub

GitHub - NVIDIA/NeMo: NeMo: a toolkit for conversational AI

NeMo: a toolkit for conversational AI. Contribute to NVIDIA/NeMo development by creating an account on GitHub.

GitHub

GitHub - NVIDIA/flowtron: Flowtron is an auto-regressive flow-based...

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer - GitHub - NVIDIA/flowtron: Flowtron is an auto-regre...

GitHub

DeepLearningExamples/PyTorch/SpeechSynthesis at master · NVIDIA/Dee...

Deep Learning Examples. Contribute to NVIDIA/DeepLearningExamples development by creating an account on GitHub.

GitHub

Speech-Backbones/Grad-TTS at main · huawei-noah/Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. - Speech-Backbones/Grad-TTS at main · huawei-noah/Speech-Backbones

#Realistic text-to-speech open source libraries