#whisper ai

1 messages · Page 1 of 1 (latest)

subtle garden
#

do you know if openai plan on doing a second version of whisper ?

shell onyxBOT
#

Hi @subtle garden. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Yes, OpenAI has indeed released a second version of the Whisper model, known as large-v2. This version was trained for 2.5 times more epochs and includes additional regularization techniques such as SpecAugment, stochastic depth, and BPE dropout. The architecture and size of the model remained the same as the original large model, which is now renamed to large-v1. The large-v2 model shows improved performance in transcription, translation, and language identification compared to the large-v1 model.

In addition to large-v2, OpenAI also released a large-v3 model in November 2023.

You can upgrade to the latest version of the Whisper package with the following command:

pip install --upgrade --no-deps --force-reinstall git+<https://github.com/openai/whisper.git>

After upgrading, calling whisper.load_model("large") will load the new large-v2 model.

For more details, you can refer to the official announcement (https://github.com/openai/whisper/discussions/661) and the model card (https://github.com/openai/whisper/blob/main/model-card.md) on GitHub.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@subtle garden kapa.ai is still learning and improving, please let us know how it did by reacting below

subtle garden
#

@shell onyx do they use the voice reconrding we chat via the gpt app to improve the models ?

shell onyxBOT
#

Let me check the knowledge base for you. One moment! :books:

#

OpenAI does not train its models on audio clips from voice chats unless you choose to share your audio for them to improve voice chats for everyone. If you share your audio, then they may use audio from your voice chats to train their models. Transcribed chats may also be used to train their models depending on your choices and plan. You can learn more about your choices in this article (https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance) [^1^].

Please note that OpenAI respects your privacy and is committed to keeping secure any information they obtain from you or about you. Their use of data is governed by their customer agreements covering access to and use of their offerings [^2^].

[^1^]: (Source: https://help.openai.com/en/articles/8400625-voice-chat-faq#h_79ef19aed3)
[^2^]: (Source: https://openai.com/policies/privacy-policy)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@subtle garden kapa.ai is still learning and improving, please let us know how it did by reacting below

subtle garden
#

@shell onyx what are the limitations

shell onyxBOT
#

Let me check the knowledge base for you. One moment! :books:

#

OpenAI's models, including the GPT series and the vision model, have certain limitations. Here are some of them:

GPT-4 Limitations:

  1. Hallucination: The model may generate information that was not in the prompt or instructions, or produce incorrect outputs.
  2. It may not be suitable for every use case without significant modifications. For example, it may produce offensive outputs, biased results, or hallucinate inaccurate information.
  3. The model's performance might drop on a wide range of potential inputs, so it's important to evaluate its performance thoroughly.
  4. It may generate incorrect descriptions or captions in certain scenarios.
  5. The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.

Vision Model Limitations:

  1. The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.
  2. It may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.
  3. The model may misinterpret rotated or upside-down text or images.
  4. It may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.
  5. The model struggles with panoramic and fisheye images.
  6. It doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
  7. It may give approximate counts for objects in images.
  8. For safety reasons, a system has been implemented to block the submission of CAPTCHAs.
#

@subtle garden kapa.ai is still learning and improving, please let us know how it did by reacting below

subtle garden
#

@shell onyx I meant the limitations of the whisper model

shell onyxBOT
#

Let me check the knowledge base for you. One moment! :books:

#

The Whisper model developed by OpenAI has several limitations, as outlined in the model card (https://github.com/openai/whisper/blob/main/model-card.md). Here are some of them:

  1. Hallucination: The models are trained in a weakly supervised manner using large-scale noisy data, which can lead to the models generating texts that are not actually spoken in the audio input. This is referred to as hallucination.

  2. Uneven Performance Across Languages: The models perform unevenly across languages, with lower accuracy observed on low-resource and/or low-discoverability languages or languages where there is less training data.

  3. Disparate Performance on Accents and Dialects: The models also exhibit disparate performance on different accents and dialects of particular languages, which may include a higher word error rate across speakers of different genders, races, ages, or other demographic criteria.

  4. Repetitive Texts: The sequence-to-sequence architecture of the model makes it prone to generating repetitive texts. This can be mitigated to some degree by beam search and temperature scheduling but not perfectly.

  5. Limitations on Audio Length: The Whisper model is limited to processing 30 seconds of audio content at a time. This is due to the complexity of the model and the need to balance the surrounding context with the size of the model.

  6. Potential Misuse: There are also potential dual-use concerns with Whisper. While the technology is intended for beneficial purposes, it could potentially be used to build capable surveillance technologies or scale up existing surveillance efforts.

#

It's important to note that these limitations are part of ongoing research and development, and OpenAI continues to work on improving these models.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@subtle garden kapa.ai is still learning and improving, please let us know how it did by reacting below

worn shadow
#

ball

royal bramble
#

"While the technology is intended for beneficial purposes, it could potentially be used to build capable surveillance technologies or scale up existing surveillance efforts."

reef abyss
#

@shell onyx is large-v3 available to use through the Open AI API or else?

shell onyxBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Yes, the large-v3 model of Whisper is available to use. You can update the Whisper package and load the large-v3 model using the following commands:

To update the Whisper package:

pip install -U openai-whisper

To load the model:

import whisper
model = whisper.load_model("large-v3")

Please note that the name large now aliases to the latest model in the series, large-v3.

For more details, you can refer to the official announcement (https://github.com/openai/whisper/discussions/1762) on GitHub.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@reef abyss kapa.ai is still learning and improving, please let us know how it did by reacting below

reef abyss
#

The open AI api refers "model
string

Required
ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.
"
Does it mean it doesn't use the latest version of Whisper? @shell onyx

shell onyxBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Yes, according to the information provided, the OpenAI API currently uses the whisper-1 model, which is powered by the open-source Whisper V2 model. This means that as of the latest update, the API does not use the latest large-v3 version of the Whisper model.

However, OpenAI regularly updates its models and APIs, so it's possible that they may update the API to use the large-v3 model in the future. You can keep an eye on the official OpenAI API documentation (https://platform.openai.com/docs/models) for the latest updates.

For now, if you want to use the large-v3 model, you can do so by updating the Whisper package and loading the model locally, as mentioned in the official announcement (https://github.com/openai/whisper/discussions/1762) on GitHub.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: