The Text-to-Speech (TTS) model developed by OpenAI supports multiple languages, not just English. The vocalization method is based on the language of the input text. The TTS model generally follows the Whisper model in terms of language support. Whisper supports a wide range of languages, including but not limited to English, Spanish, French, German, Chinese, Japanese, and many more. You can generate spoken audio in these languages by providing the input text in the language of your choice.
Here is the list of supported languages:
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
You can find more information about this in the OpenAI documentation (https://platform.openai.com/docs/guides/text-to-speech).
So, if you want to generate speech in a different language, you can simply provide the input text in that language. For example, if you want to generate speech in Spanish, you can do:
curl <https://api.openai.com/v1/audio/speech> \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "El zorro marrón rápido salta sobre el perro perezoso.",
"voice": "alloy"
}' \
--output speech.mp3