#how do i make text to speech
1 messages · Page 1 of 1 (latest)
Do you need it in unity though? Is it related to unity in any way?
Unity uses c#
OK, but then what do the other languages have to do with it?
I could probably import it using c++ or atleast understand how I can generate realistic
Well, anyway, you'll need to use some kind of cloud service, like Google or Open AI tts models. You might be able to run something locally too, but it's probably gonna be resource demanding and low quality. As for the language, if it's a cloud service, you can make requests in C#. Local solutions might require using other languages as well.
If you do find a library to use locally you want one thats c# or uses c/c++ so you can still interface with it (if cpp you may need to write your own helper functions that are c compatible)
Has to be offline
That's why I'm asking for c, cs, cpp, cm because others are slow. And I need python for another project but it's not required I can just use cpp
Well, there are many various libraries. Unity has it's own Ai model inference engine based on onnx. So if you find a model in that format you can use it as is.
Other than that, there's pytorch and stuff. I think Llama cpp also allows tts. You'll need to research. There are just too many options.
I need my own to clone a voice
Not robotic voice
That is going to be an AI model that you train yourself. Once you get a model trained then you can run it locally in Unity withthe Sentis package.
i cant even train text model
with 200gb high quality data
how am i supposed to get proper voicce data
This is getting pretty far outside the Unity world but you can look up XTTS on hugging face
Many modern text to speech models allow voice sampling for replicating your voice. Look into the things I mentioned.
Then you're going to need to lower your standards
clearly it had been done
and not by training it
Yes, using existing text-to-speech text banks that you've decided aren't good enough for you
they dont clone voice
and they dont use proper language
like c++ or c#
Right. And "Cloning a voice" isn't something you can just do
It usually involves training a large AI model on a bunch of clips
which you've also refused
So, you need to either do that, or lower your standards
it had been done without using large ai model
nobody uses it
except for microsoft
Not really. Text to speech is either using recorded voice banks, or AI
Both of which you've refused
what do u mean by voice banks
Lots of recorded sounds, either entire words or phonemes, spliced together to make the sentence you've typed in
A.k.a, that thing you said you didn't want to do
Because you want to clone a voice
how will that allow different tones and avoid extra letters like h
By having a large voice bank
You don't have to train a model from scratch. You can fine tune one on a smaller set of data. That should be more achievable with limited hardware and probably provide better results than existing voice cloning capabilities(although I've seen some voice cloning/sampling examples that sound quite impressive).
As for the programming languages, I don't see how that's a problem. You can have an interop with virtually any language. At least if you're targeting pc/mac platforms.
I wonder if its even possible in the first place to clone your voice exactly, without training an AI
Prior to a lot of the recent AI TTS examples, the voices often sounded robotic and clearly synthetic
Without AI, you can only reproduce the phonemes you have recorded. At best you get the star trek computer voice. Or Siri I guess.
what is your big picture goal?