#Help With Multi-Lingual Audio Files

1 messages · Page 1 of 1 (latest)

opaque parrot
#

The attached file is audio from https://www.youtube.com/watch?v=tFsKulFcauY
The audio has both Spanish and English. I wish to be able to transcribe audio like these accurately, fast and with speaker diarization.
I have tried a number of different approaches. The only one that worked so far is whisper-large with speaker diarization enabled. However, whisper-large is slow.

In terms of accuracy, whisper-large bugs out rarely. whisper default, nova-2 end up many times have only the Spanish part of the audio.
Do you have any recommendations of methods I should try for this use case?

This video shows an example of consecutive interpreting. The interpreter waits for the primary speaker to finish and then interprets everything that was said.

• Find resources for Federal Court Interpreters: https://www.uscourts.gov/services-forms/federal-court-interpreters
• Sign up for U.S. Courts email updates: http://www.uscourts.gov/email-...

▶ Play video
mossy blazeBOT
#

Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:

  • Provide the request_id if you've a question about a transcription response.
  • The options you used or the api.deepgram.com URL you sent your request to, including parameters.
  • Any code snippets you can include.
  • Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.
floral crag
#

I think this would fall under the term code-switching (even though in linguistics terms it isn't technically code-switching since that would mean switching languages within the same sentence). In terms of speech-to-text, code-switching is when there is a switch from one language to another language in the same audio sample.

We are currently working on developing this feature for Nova-2, but it isn't available yet. Whisper is able to do it but there have been reports of bugs where the language output isn't always accurate (and our Whisper cloud is slow, like you said).

I'm sorry I can't give you a better answer. The best thing is to use Whisper for now but stay tuned for our code-switching which is going to be released in the next few months for Nova-2, specifically for English and Spanish code switching.