#Can I choose the version of Conversational AI TTS model?
1 messages · Page 1 of 1 (latest)
Currently, you cannot manually select the speech model through the web interface. The model is automatically determined based on the language of the agent. For English, the system defaults to Turbo V2, while for other languages, it switches to Turbo V2.5, our multilingual model.
When was the multilingual model last updated?
The multilingual model was last updated one month ago.
I'd like to roll back to the previous version because the current version performs worse in Finnish than the previous one
Currently, there is no option to roll back to a previous version of the AI model. Variations in voice output can occur due to the non-deterministic nature of AI and periodic updates to voice models. If you're experiencing issues with Finnish, consider using properly cloned voices specifically for Finnish and utilizing the Projects feature for better pronunciation management. If you have further questions or need assistance, feel free to reach out.
The issue I'm facing is in the transcription of spoken Finnish, so I cannot control that?
Variations in voice output can occur due to several reasons, including the non-deterministic nature of AI and periodic updates to voice models. Unfortunately, there is no set timeline for ensuring voice consistency post-update, and rolling back to a previous version is not currently an option.
To improve consistency, consider using properly cloned voices specifically for Finnish, utilizing the Projects feature for better pronunciation management, writing out numbers in full, and keeping text segments below 800 characters to minimize language switching. If you have further questions or need assistance, feel free to reach out.
The output of the voices are fine. The issue is in that they don't understand what I'm saying because the transcription from speech to text fails.
The transcript of the audio is for information only and isn't used by the AI to generate the audio. To improve the accuracy of speech-to-speech conversion, ensure that the Similarity setting is high, ideally between 90-95%, and that Speaker Boost is on. Additionally, using the Multilingual STS model can help reduce mispronunciations, even for English audio. If you encounter errors, you can regenerate the audio under certain conditions, such as keeping the same prompt, voice, and model, and if the first generation was made less than two hours ago.
Does Speaker Boost and Similarity also apply to Conversational AI?
Speaker Boost and Similarity settings are specifically related to the Speech-to-Speech (STS) feature. They help in replicating the delivery aspects of the original speaker, such as accent, tone, and intonation. If you are using the Conversational AI feature, these settings may not directly apply.
How do I then control these for Conversational AI?