Does the voice mode (not the new one) use whisper? If so do we know what model? The voice to text (microphone icon) works extremely well, am I’m assuming it’s running on whisper large V3 or some very good model. But for the voice to voice (headphones icon) it seems to be much worse at understanding the same saying and phrases as the voice to text version. This had me wondering if it wasn’t whisper or a worse model of whisper and if so why would they choose to use a worse version in one area but not another.
#whisper voice model
1 messages · Page 1 of 1 (latest)
Hi @digital gust. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Based on the information provided in the knowledge sources, the Whisper model developed by OpenAI is indeed a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and can perform multilingual speech recognition, speech translation, and language identification. The Whisper v2-large model, referred to as "whisper-1", is available through an API and is optimized for faster performance source (https://platform.openai.com/docs/models).
However, the knowledge sources do not provide specific information about the use of the Whisper model in the voice mode you mentioned. It's also not clear which specific model is used for the voice-to-voice (headphones icon) feature.
The performance of speech recognition can vary based on several factors, including the quality of the audio input, background noise, and the specific phrases or words used. It's also possible that different models or versions of models are used in different parts of the system for various reasons, such as computational efficiency, latency requirements, or specific capabilities of the models.
For more detailed information about the Whisper model, including its training process and capabilities, you can refer to the Whisper paper (https://arxiv.org/abs/2212.04356) and the Whisper model card (https://github.com/openai/whisper/blob/main/model-card.md).
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@digital gust kapa.ai is still learning and improving, please let us know how it did by reacting below
yes, i think its a smaller version of whisper to optimize speed though.