#speech with other language

1 messages · Page 1 of 1 (latest)

normal vine
#
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

I want to know if his vocalization method is based on the language of the input text, or if it currently only supports English.

strange daggerBOT
#

Hi @normal vine. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

The Text-to-Speech (TTS) model developed by OpenAI supports multiple languages, not just English. The vocalization method is based on the language of the input text. The TTS model generally follows the Whisper model in terms of language support. Whisper supports a wide range of languages, including but not limited to English, Spanish, French, German, Chinese, Japanese, and many more. You can generate spoken audio in these languages by providing the input text in the language of your choice.

Here is the list of supported languages:
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

You can find more information about this in the OpenAI documentation (https://platform.openai.com/docs/guides/text-to-speech).

So, if you want to generate speech in a different language, you can simply provide the input text in that language. For example, if you want to generate speech in Spanish, you can do:

curl <https://api.openai.com/v1/audio/speech> \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "El zorro marrón rápido salta sobre el perro perezoso.",
    "voice": "alloy"
  }' \
  --output speech.mp3
#

In this example, "El zorro marrón rápido salta sobre el perro perezoso." is the Spanish translation of "The quick brown fox jumps over the lazy dog."

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@normal vine kapa.ai is still learning and improving, please let us know how it did by reacting below

normal vine
#

@strange dagger

I didn't see an option to adjust whether it's HD in the options. Is it differentiated by response_format? If so, please tell me which response_format options are HD.

strange daggerBOT
#

Let me check the knowledge base for you. One moment! :books:

#

The response_format parameter does not control the quality of the output. It only determines the format in which the generated content is returned. For instance, in the context of image generation, the response_format could be url or b64_json, indicating whether the generated image is returned as a URL or as a base64-encoded JSON object.

The quality of the output, such as whether it's in HD or not, is controlled by a different parameter. For example, in the context of image generation with the dall-e-3 model, the quality parameter controls the quality of the generated image. It can be set to standard or hd, with hd creating images with finer details and greater consistency across the image.

Here's an example from the OpenAI API documentation (https://platform.openai.com/docs/api-reference):

{
  "prompt": "A cute baby sea otter",
  "model": "dall-e-3",
  "n": 1,
  "quality": "hd",
  "response_format": "url",
  "size": "1024x1024"
}

In this example, the quality parameter is set to hd, indicating that the generated image will be in high definition. The response_format parameter is set to url, indicating that the generated image will be returned as a URL.

Please note that the quality parameter is only supported for the dall-e-3 model. Other models or APIs may have different parameters for controlling the quality of the output.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

normal vine
#

@strange dagger I am asking for audio, not image generation

strange daggerBOT
#

Let me check the knowledge base for you. One moment! :books:

#

I'm sorry for the confusion. Based on the information provided in the extracts, there is no explicit parameter to adjust the quality (such as HD) of the audio output in the OpenAI API. The response_format parameter is used to specify the format of the transcript output, not the quality of the audio. The available options for response_format are json, text, srt, verbose_json, or vtt.

Here's an example from the OpenAI API documentation (https://platform.openai.com/docs/api-reference):

{
  "file": "<audio file>",
  "model": "whisper-1",
  "language": "<language code>",
  "prompt": "<prompt text>",
  "response_format": "json",
  "temperature": 0,
  "timestamp_granularities[]": "<granularity>"
}

In this example, the response_format parameter is set to json, indicating that the transcript will be returned in JSON format.

If you need to adjust the quality of the audio output, I recommend checking the OpenAI API documentation or reaching out to OpenAI support for more information.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@normal vine kapa.ai is still learning and improving, please let us know how it did by reacting below