#How can i turn of chat completion when doing only audio transcription?

1 messages · Page 1 of 1 (latest)

cosmic yoke
#

I am using openai realtime api to genrate transcription from audio. Right now its genrating audio transcription but also genrating text completion. how can i turn off the text completion?
Here is my session configuration of realtime api:
$post_fields = [
"model" => "gpt-4o-realtime-preview-2024-12-17",
"modalities" => ["text"],
"input_audio_transcription" => [
"model" => 'whisper-1',
],
"turn_detection" => [
'type' => 'server_vad',
'silence_duration_ms' => 1
]
];

lusty pagodaBOT
#

Hi @cosmic yoke. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

To disable text completion while using the OpenAI Realtime API for audio transcription, you need to adjust the modalities setting in your session configuration. Currently, your configuration includes "modalities" => ["text"], which indicates that text output is enabled. To focus solely on audio transcription without generating text completions, you should remove "text" from the modalities array.

Here's how you can modify your session configuration:

$post_fields = [
    "model" => "gpt-4o-realtime-preview-2024-12-17",
    "modalities" => ["audio"], // Change this to focus on audio only
    "input_audio_transcription" => [
        "model" => 'whisper-1',
    ],
    "turn_detection" => [
        'type' => 'server_vad',
        'silence_duration_ms' => 1
    ]
];

By setting "modalities" => ["audio"], you ensure that the session is configured to handle audio input and output only, without generating text completions. This should help you achieve the desired behavior of focusing solely on audio transcription.

For more details, you can refer to the OpenAI API documentation for further guidance on configuring modalities.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@cosmic yoke Kapa.ai is still learning and improving, please let me know how I did by reacting below.

cosmic yoke
#

can not put only audio in modalitites, the api throwing this error: "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text']."
@lusty pagoda

lusty pagodaBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

It seems that the OpenAI Realtime API requires specific combinations of modalities. According to the information provided, the supported combinations for modalities are ["text"] and ["audio", "text"]. Unfortunately, using only ["audio"] is not supported, which is why you are encountering the error.

To proceed, you will need to include both "audio" and "text" in the modalities array. This means that while you can focus on audio transcription, the API will still expect to handle text in some capacity. Here's how you can adjust your configuration:

$post_fields = [
    "model" => "gpt-4o-realtime-preview-2024-12-17",
    "modalities" => ["audio", "text"], // Use both audio and text
    "input_audio_transcription" => [
        "model" => 'whisper-1',
    ],
    "turn_detection" => [
        'type' => 'server_vad',
        'silence_duration_ms' => 1
    ]
];

While this configuration includes both modalities, you can focus on processing the audio transcription results and choose to ignore the text completions if they are not needed for your application.

For more information, you can refer to the OpenAI API documentation for further guidance on configuring modalities and handling API responses.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@cosmic yoke Kapa.ai is still learning and improving, please let me know how I did by reacting below.

cosmic yoke
#

But genrating text completion will increase the api cost, right? @lusty pagoda

lusty pagodaBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, generating text completions can increase the API cost because the cost is typically based on the number of tokens processed, which includes both input and output tokens. When you enable both audio and text modalities, the API will process and potentially generate text completions, which can contribute to the overall token usage and thus increase the cost.

To manage costs, you can:

  1. Monitor Token Usage: Keep track of the number of tokens used in your API requests. This can help you understand how much each request is costing and identify ways to optimize usage.

  2. Limit Output Tokens: If you must include text completions, consider setting a limit on the maximum number of output tokens. This can help control costs by preventing the generation of unnecessarily long text completions.

  3. Review API Logs: Use the API's logging and usage tracking features to review how tokens are being used and identify any areas where you might reduce usage.

For more detailed information on managing costs and understanding token usage, you can refer to the OpenAI API documentation.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: