In the same topic of having a new voice for Neuro, I would like to endorse the use of Bark transformer. https://github.com/suno-ai/bark
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.
Note: For use with utterances (run on separate thread), not fast enough for real time TTS on responses.
Google Collab with sample audio: https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing#scrollTo=NyYQ--3YksJY