#Recognizing and pronouncing foreign language words
1 messages · Page 1 of 1 (latest)
well, the multilingual model would theoretically do that, but in practice it won't work very well
the main issue is that the multilingual model takes some time to "switch" between languages
so for example if you have a sentence of english text followed by a sentence of french text, it will pronounce the first part of the french text with an english accent
so if it's only single foreign words within an otherwise english text, it would likely still pronounce them incorrectly
Thanks @primal token - what about one or two Spanish words used in an English sentence?
you'd need to try it, but I'd guess they would probably still be mispronounced
So the solution would be to record those words pronounced correctly and cut and paste them in?
pretty much
if you only feed the multilingual model those words, it's far more likely to recognize it as spanish and pronounce them correctly
I'd wager you can do that programmatically if you need to - you can see the demo I linked earlier for something similar (only you would be inserting audio instead of cutting it out) https://github.com/lugia19/ElvenLabsWhisperXSplit
I'm new to the software - I'll give that a try once I figure out how to "feed words" to the multilingual model. Thanks for the link to the demo - will check it out.
well, by "feed words" I just mean making a TTS call, just like the normal text
so if you have The quick brown "volpe" jumped over the lazy dog and you want volpe pronounced in italian, you would:
-Synthesize the entire thing with the multilingual model
-Cut out the mispronounced word
-Get the model to just generate the word volpe by itself
-Insert it where you removed it in the original audio