#Text to voice chatgpt 4o?

1 messages · Page 1 of 1 (latest)

heavy tinsel
#

Are there any new features for voice to audio with chatgpt 4o?

proven kayakBOT
#

Hi @heavy tinsel. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Yes, OpenAI is actively working on introducing audio capabilities to GPT-4o. As of now, GPT-4o supports text and image inputs, with audio support expected to be introduced soon. This will allow the model to handle a combination of text, audio, and video inputs, and generate outputs in text, audio, and image formats source (https://cookbook.openai.com/examples/gpt4o/introduction_to_gpt4o).

In the meantime, ChatGPT continues to support voice interactions via the pre-existing Voice Mode feature. This feature allows users to engage in a back-and-forth conversation with the assistant using voice source (https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4-gpt-4-turbo-and-gpt-4o).

GPT-4o real-time voice and vision will be rolling out to a limited Alpha for ChatGPT Plus users in a few weeks and will be widely available for ChatGPT Plus users over the coming months source (https://help.openai.com/en/articles/8400625-voice-chat-faq#h_686ac13528).

Please note that the exact implementation details and pricing for the audio capabilities of GPT-4o have not been disclosed yet. For the latest updates, you can follow the official announcements on the OpenAI community forum source (https://community.openai.com/t/744455).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

heavy tinsel
#

@proven kayak I mean has text to voice output improved at all. specifically when doing api calls.

proven kayakBOT
#

Let me check the knowledge base for you. One moment! :books:

#

OpenAI's text-to-speech (TTS) API allows developers to generate high-quality spoken audio from text. The API initially offered six preset voices and two model variants, tts-1 and tts-1-hd. The tts-1 model is optimized for real-time use cases, while tts-1-hd is optimized for quality source (https://help.openai.com/en/articles/8555505-tts-api).

As for improvements, the changelog or forum posts provided do not specify any recent updates or enhancements to the text-to-speech capabilities. The latest major update mentioned is the release of GPT-4 Turbo Preview, but it doesn't specifically mention improvements to the text-to-speech feature source (https://platform.openai.com/docs/changelog).

For invoking a different language in text-to-speech, you can generate spoken audio in the desired language by providing the input text in that language source (https://community.openai.com/t/655426).

Here's an example of how to use the TTS API:

response = await async_openai_client.audio.speech.create(
    model="tts-1-hd-1106",
    voice="nova",
    input=text,
    response_format='opus'
)

For the most accurate and up-to-date information, it's recommended to check the official OpenAI documentation and community forums regularly.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@heavy tinsel kapa.ai is still learning and improving, please let us know how it did by reacting below

heavy tinsel
#

@proven kayak I mean text to speech

proven kayakBOT
#

Let me check the knowledge base for you. One moment! :books:

#

OpenAI's Text-to-Speech (TTS) API allows developers to generate high-quality spoken audio from text. The API offers six preset voices and two model variants: tts-1 and tts-1-hd. The tts-1 model is optimized for real-time use cases, while tts-1-hd is optimized for quality. The pricing starts at $0.015 per 1,000 input characters source (https://help.openai.com/en/articles/8555505-tts-api).

You can use the TTS API by accessing the audio/speech endpoint with an OpenAI API account. The rate limits begin at 50 RPM for paid accounts, and the maximum input size you can submit per request is 4096 characters, equivalent to approximately 5 minutes of audio at default speed. It's also possible to stream audio by setting stream=True, which allows you to chunk the returned audio file source (https://help.openai.com/en/articles/8555505-tts-api).

Here's an example of how to use the TTS API:

from pathlib import Path
from openai import OpenAI
client = OpenAI()

speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
 model="tts-1",
 voice="alloy",
 input="Today is a wonderful day to build something people love!"
)

response.stream_to_file(speech_file_path)

source (https://platform.openai.com/docs/guides/text-to-speech)

This code will export an .MP3 file that contains the voice selected by the voice variable, saying what you placed as the input. Depending on what you are doing with the TTS model, you may want to use tts-1-hd instead of tts-1 for higher quality source (https://community.openai.com/t/707521).

#

However, the provided sources do not mention any recent improvements or updates to the TTS API. For the most accurate and up-to-date information, it's recommended to check the official OpenAI documentation and community forums regularly.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: