#Whisper: Real time real time transcription?

6 messages · Page 1 of 1 (latest)

austere gazelle
#

Hi is it possible to do real time transcription using Whisper opensource model?

unique basin
#

Good question I'm also interested in it if it's possible

flint copper
#

Hello! Based on my understanding, it seems that Whisper does not support real-time transcription as it requires the entire audio clip to be sent to the API for transcription. Additionally, Whisper does not currently support data stream functionality like Chat Completion does. Therefore, it may not be possible to perform real-time transcription using Whisper at this time.

austere gazelle
#

yeah but one possible way around would be to take 3-5 seconds of input chunks and transcribe them (sort of incrementally transcribing the audio every nth second)

flint copper
#

Ah! I understand the direction you are considering, but allow me to provide some insights that may be useful.

  1. Currently, the rate limit for the Whisper API is 50 requests per minute. However, if we follow your proposed workaround of sending 3-5 seconds of audio chunks, it should work fine. This is because we only need to send a request 12-20 times per minute, which falls well under the limit. However, this will only be possible if the application is used by a maximum of two users at the same time, as exceeding this limit may put your application at risk of being rate-limited.

  2. If we send requests at a 5-second interval, the transcription process would no longer be considered "real-time." This is because the transcription would be delayed by 5 seconds, in addition to the time it takes to send the audio chunk to the API and receive it back. Any network interference may also cause further disruptions.

  3. Unless you have found a way to ensure that the audio chunks contain complete sentences and do not cut off in the middle of a word or sentence, it may be impossible to achieve real-time transcription using this approach.

I hope this information is helpful! Please let me know if you have any further questions or concerns.

austere gazelle
#

Thanks for the detailed response.
Yes you are correct. Actually whisper does not take live input stream so 100% real time is not possible.
About the 50 request per minute are you talking about whisperV2 API (Paid)? Currently I am using whisper open source and I am not aware of any such limitation
Thanks