#Is it possible to train 11labs on 1 language to prevent drifting into an other language? I want to
1 messages · Page 1 of 1 (latest)
Unfortunately, you cannot fine-tune the AI model on languages other than the ones it has already been trained on. The AI model is currently trained in a way that doesn't allow for selecting the specific language you want the AI to speak via a tag or marker. Instead, the way you "select" the language is by writing in the language you want the AI to speak.
If you are using a voice that is not native to the language - for example, one of the pre-made or generated voices since they are English - the AI might have a slight English accent when speaking other languages. To get the optimal results, it is recommended to clone a voice that speaks the original language with the correct accent. This is especially important when dealing with languages that are very similar and share a lot of common words. This ensures that the AI has the most information to understand which pronunciation and language it should choose.
There have been reports of “language switching”, particularly between languages that share similarities in text but may have distinct pronunciations or accents. This is when the AI gets confused and doesn’t have enough context and switches language in the middle of generation. This issue appears to be less present when using a well-cloned voice that was originally cloned on someone speaking the correct language with the correct accent. The team at ElevenLabs is actively working on this issue.
In summary, while you cannot train the AI on a specific language, you can mitigate most of the language switching and accent issues by using a properly cloned voice that speaks the language you want the AI to speak.
Does this mean that if I upload a custom voice which speaks Dutch, the likelyhood of switching to other languages then Dutch is less?
Yes, that's correct. If you clone a voice speaking Dutch, the AI will be trained to speak in Dutch with the correct accent and pronunciation. This is because the language is determined by the text, and the accent and pronunciation are determined by the voice itself. So, if you use a voice that was cloned speaking Dutch, it should maintain the Dutch language and accent when generating speech. However, it's important to note that the AI might still have some limitations, especially when dealing with languages that are very similar and share a lot of common words.
I mean with voice pro you can upload 6 hours of one voice talking. If I will upload a file with a Dutch speaker will that bring down the likelihood of the ai switching to an other language?
Yes, if you clone a voice speaking Dutch using the Voice Pro feature, it should help in reducing the likelihood of the AI switching to another language. The AI model uses the cloned voice as a reference, so if the voice is speaking Dutch, it will be more likely to generate speech in Dutch. However, it's important to note that generative AI can sometimes be unpredictable, and there might still be instances where it might switch accents or languages.
To get the optimal results, it's recommended to clone a voice that speaks the original language with the correct accent. This is especially important when dealing with languages that are very similar and share a lot of common words. This ensures that the AI has the most information to understand which pronunciation and language it should choose.
Also, when generating long-form content, the AI may, under certain circumstances, drift between accents or even between languages, especially if they are closely related. However, most of these issues have been addressed with the Projects feature, which allows you to generate very long-form content and appears to minimize most of these issues. If you do encounter one of these issues, you can simply regenerate a specific section that you're unhappy with, without the need to regenerate all paragraphs.
Remember, the language is determined by the text, the accent and pronunciation is determined by the voice itself. It needs both of these contexts to function optimally.
From how many tokens are these artifacts visible? You say projects solve this problem, is this available through the API?
I'm sorry, but I can't provide the information you're looking for.
Is projects available through the API?