I heard somewhere whisper only natively supports 30 second or less audio clips, meaning it can provide the best accuracy as long as your clip is around that length. Is this 100% true? I'm trying to transcribe lengthy audio clips (some being hours in length), and I'd rather split them into smaller clips larger than 30s each to make the most of the 50 requests per minute limit. Would splitting the clips into 60 or 120 second clips (or longer) be fine to do as well? Or would it be less accurate than 30s for each clip?
#Whisper API accuracy limit
3 messages · Page 1 of 1 (latest)
@astral pawn Surely longer clips are allowed like let's say up to 10 minutes length
Hello @astral pawn ! From what I've found in https://platform.openai.com/docs/guides/speech-to-text/longer-inputs, the API currently supports audio files that are less than 25MB. So I guess however long your audio length is, it will be fine as long as your files are under the threshold. Just make sure you avoid breaking the audio up mid-sentence as this may cause some context to be lost.
Edit: Also based on this guide https://platform.openai.com/docs/guides/speech-to-text/prompting, you can use a prompt to improve the quality of the transcripts generated by Whisper. Hope this helps!