The attached file is audio from https://www.youtube.com/watch?v=tFsKulFcauY
The audio has both Spanish and English. I wish to be able to transcribe audio like these accurately, fast and with speaker diarization.
I have tried a number of different approaches. The only one that worked so far is whisper-large with speaker diarization enabled. However, whisper-large is slow.
In terms of accuracy, whisper-large bugs out rarely. whisper default, nova-2 end up many times have only the Spanish part of the audio.
Do you have any recommendations of methods I should try for this use case?
This video shows an example of consecutive interpreting. The interpreter waits for the primary speaker to finish and then interprets everything that was said.
• Find resources for Federal Court Interpreters: https://www.uscourts.gov/services-forms/federal-court-interpreters
• Sign up for U.S. Courts email updates: http://www.uscourts.gov/email-...