#whisper bad
1 messages · Page 1 of 1 (latest)
Hi @tall bison. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Whisper, OpenAI's automatic speech recognition (ASR) system, is a powerful tool for transcribing audio and generating subtitles. However, it's important to note that it's not explicitly trained for word-level timestamps, and the current outputs are produced by an inference-time trick, which may not give perfectly accurate timing, especially when dealing with pauses (source (https://github.com/openai/whisper/discussions/1855)).
Several users have reported issues with the alignment of subtitles with the timeline, precision of subtitle timings, and the accuracy of timestamps (source (https://github.com/openai/whisper/discussions/1995), source (https://github.com/openai/whisper/discussions/1147), source (https://github.com/openai/whisper/discussions/809), source (https://github.com/openai/whisper/discussions/89), source (https://github.com/openai/whisper/discussions/1046), source (https://github.com/openai/whisper/discussions/124), source (https://github.com/openai/whisper/discussions/1720)).
To improve the accuracy of timestamps, you can try running Whisper with the --word_timestamps True option. This option adds word timestamps, which are also used to improve the accuracy of the segment timestamps (source (https://github.com/openai/whisper/discussions/1995), source (https://github.com/openai/whisper/discussions/1147), source (https://github.com/openai/whisper/discussions/1855)).