audio/transcriptions full example with curl | OpenAI | Page 1

prime isle Mar 7, 2024, 8:17 AM

#

Please provide me with an example of transcription. You must include all optional items. (use curl)

vernal talonBOT Mar 7, 2024, 8:17 AM

#

Hi @prime isle. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Based on the information provided in the extracts, here is an example of how to use the audio/transcriptions endpoint with curl:

curl <https://api.openai.com/v1/audio/transcriptions> \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-1"

This command sends a POST request to the audio/transcriptions endpoint of the OpenAI API. The -H flags are used to set the Authorization and Content-Type headers. The -F flags are used to specify the form data for the request, which includes the audio file to be transcribed and the model to be used for the transcription.

Please replace "/path/to/file/audio.mp3" with the path to your audio file and $OPENAI_API_KEY with your OpenAI API key.

Unfortunately, the extracts do not provide information on all optional parameters for the audio/transcriptions endpoint when using curl. For a full list of available parameters, please refer to the OpenAI API Reference (https://platform.openai.com/docs/api-reference/audio/createTranscription).

Source (https://community.openai.com/t/406444)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@prime isle kapa.ai is still learning and improving, please let us know how it did by reacting below

prime isle Mar 7, 2024, 8:19 AM

#

@vernal talon I need you provide: language, prompt

vernal talonBOT Mar 7, 2024, 8:19 AM

#

Let me check the knowledge base for you. One moment! :books:

#

Based on the information provided in the extracts, here is an example of how to use the audio/transcriptions endpoint with curl, including the language and prompt parameters:

curl <https://api.openai.com/v1/audio/transcriptions> \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-1" \
  -F language="en" \
  -F prompt="The transcript is about OpenAI which makes technology like DALL·E, GPT-3, and ChatGPT with the hope of one day building an AGI system that benefits all of humanity"

In this command, the language parameter is set to "en" (English), and the prompt parameter is set to a specific string. The prompt can be used to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt, so it will be more likely to use capitalization and punctuation if the prompt does too.

Please replace "/path/to/file/audio.mp3" with the path to your audio file and $OPENAI_API_KEY with your OpenAI API key.

Source (https://platform.openai.com/docs/api-reference)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@prime isle kapa.ai is still learning and improving, please let us know how it did by reacting below

prime isle Mar 7, 2024, 8:22 AM

#

@vernal talon so the language is mean Set 1 right?
https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes

List of ISO 639 language codes

ISO 639 is a standardized nomenclature used to classify languages. Each language is assigned a two-letter (set 1) and three-letter lowercase abbreviation (sets 2–5).

vernal talonBOT Mar 7, 2024, 8:22 AM

#

Let me check the knowledge base for you. One moment! :books:

#

Yes, the language parameter in the OpenAI API request should be set using the ISO 639-1 language codes. These are two-letter codes that represent different languages. For example, 'en' represents English, 'es' represents Spanish, 'fr' represents French, and so on.

You can find a full list of these codes on the Wikipedia page for ISO 639-1 (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) that you linked.

Here's how you would set the language parameter to Spanish ('es') in the curl command:

curl <https://api.openai.com/v1/audio/transcriptions> \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-1" \
  -F language="es" \
  -F prompt="The transcript is about OpenAI which makes technology like DALL·E, GPT-3, and ChatGPT with the hope of one day building an AGI system that benefits all of humanity"

Please replace "/path/to/file/audio.mp3" with the path to your audio file and $OPENAI_API_KEY with your OpenAI API key.

Source (https://platform.openai.com/docs/api-reference)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@prime isle kapa.ai is still learning and improving, please let us know how it did by reacting below

prime isle Mar 7, 2024, 8:25 AM

#

@vernal talon If my lyrics are in Chinese, can I output the results in English?

vernal talonBOT Mar 7, 2024, 8:25 AM

#

Let me check the knowledge base for you. One moment! :books:

#

Based on the information provided in the extracts, the Whisper model developed by OpenAI can transcribe audio in various languages, including Chinese. However, it only supports Any-to-English speech translation. This means that if your lyrics are in Chinese, Whisper can transcribe them into Chinese text, but it cannot translate them into English.

Here's an extract from a discussion on GitHub that explains this:

From the name of the file, assuming you're trying to translate English speech into Chinese text, this is the expected behavior. Whisper can do transcription in Chinese, but it only does Any-to-English speech translation.

So, if you need to translate the transcribed Chinese lyrics into English, you would need to use a separate translation service or tool.

Source (https://github.com/openai/whisper/discussions/386)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@prime isle kapa.ai is still learning and improving, please let us know how it did by reacting below

prime isle Mar 7, 2024, 8:28 AM

#

@vernal talon If the audio is in Chinese and contains some sentences with English words, what will the output look like?

vernal talonBOT Mar 7, 2024, 8:28 AM

#

Let me check the knowledge base for you. One moment! :books:

#

Based on the information provided in the extracts, the Whisper model developed by OpenAI is designed to handle monolingual audio inputs. It doesn't support code-switching inputs (inputs that switch between languages) very well.

Here's an extract from a discussion on GitHub that explains this:

It's intended for monolingual audio inputs, and --language should specify the language used in the audio (English or otherwise). Whisper doesn't support code-switching inputs very well, and some third-party tools could be used to segment the audio into sentences and separately run Whisper to transcribe each.

So, if your audio is primarily in Chinese but contains some sentences with English words, the transcription might not be as accurate as it would be for a monolingual audio input. The model might not correctly transcribe the English words, especially if they are embedded within Chinese sentences.

Source (https://github.com/openai/whisper/discussions/1160)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@prime isle kapa.ai is still learning and improving, please let us know how it did by reacting below

prime isle Mar 7, 2024, 8:54 AM

#

@vernal talon If I change the temperature to 0, does the content of the prompt become irrelevant? In other words, does the prompt only have an effect when the temperature is not 0? Is my understanding correct?

vernal talonBOT Mar 7, 2024, 8:54 AM

#

Let me check the knowledge base for you. One moment! :books:

#

No, your understanding is not entirely correct. The temperature parameter and the prompt are both important and serve different purposes in controlling the output of the model.

The temperature parameter controls the randomness of the model's output. A higher temperature value (closer to 1) makes the output more random, while a lower value (closer to 0) makes the output more deterministic, meaning the model is more likely to choose the most probable completion.

On the other hand, the prompt is the input that you provide to the model. It sets the context for the model's response and guides the model on what kind of response you're looking for.

Even if you set the temperature to 0, the prompt still matters. The model will generate the most probable completion based on the given prompt. If you change the prompt, the model's output will likely change as well, even if the temperature is set to 0.

So, both the temperature and the prompt play crucial roles in controlling the model's output, and they work independently of each other.

#audio/transcriptions full example with curl

Source (https://community.openai.com/t/406444)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Source (https://platform.openai.com/docs/api-reference)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Source (https://platform.openai.com/docs/api-reference)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Source (https://github.com/openai/whisper/discussions/386)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Source (https://github.com/openai/whisper/discussions/1160)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Source (https://help.openai.com/en/articles/6643200-why-am-i-getting-different-completions-on-playground-vs-the-api)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: