#whisper bad

1 messages · Page 1 of 1 (latest)

tall bison
#

How come whisper ai timestamps are worst than premiere pro subtitles generation?

craggy summitBOT
#

Hi @tall bison. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Whisper, OpenAI's automatic speech recognition (ASR) system, is a powerful tool for transcribing audio and generating subtitles. However, it's important to note that it's not explicitly trained for word-level timestamps, and the current outputs are produced by an inference-time trick, which may not give perfectly accurate timing, especially when dealing with pauses (source (https://github.com/openai/whisper/discussions/1855)).

Several users have reported issues with the alignment of subtitles with the timeline, precision of subtitle timings, and the accuracy of timestamps (source (https://github.com/openai/whisper/discussions/1995), source (https://github.com/openai/whisper/discussions/1147), source (https://github.com/openai/whisper/discussions/809), source (https://github.com/openai/whisper/discussions/89), source (https://github.com/openai/whisper/discussions/1046), source (https://github.com/openai/whisper/discussions/124), source (https://github.com/openai/whisper/discussions/1720)).

To improve the accuracy of timestamps, you can try running Whisper with the --word_timestamps True option. This option adds word timestamps, which are also used to improve the accuracy of the segment timestamps (source (https://github.com/openai/whisper/discussions/1995), source (https://github.com/openai/whisper/discussions/1147), source (https://github.com/openai/whisper/discussions/1855)).

#

However, it's worth noting that the performance of Whisper and Premiere Pro may vary depending on the specific use case and the type of audio being processed. Premiere Pro is a professional video editing software with advanced subtitle generation capabilities, while Whisper is a research project focused on automatic speech recognition. The two tools may use different methods and algorithms for generating subtitles and timestamps, which could lead to differences in their performance and accuracy.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: