How does the streaming API filter out other languages? | Deepgram | Page 1

teal shale Jul 15, 2024, 5:16 PM

#

I see that language detection is only available with the pre-recorded API (https://developers.deepgram.com/docs/language-detection), but I've also noticed that when using the streaming API, it does a fairly good job filtering out audio that isn't the target language provided to the API. I'm wondering how this works? Clearly the streaming API can do some language detection since it's properly ignoring other languages.

wet nightBOT Jul 15, 2024, 5:16 PM

#

Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:

Provide the request_id if you've a question about a transcription response.
The options you used or the api.deepgram.com URL you sent your request to, including parameters.
Any code snippets you can include.
Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.

shy juniper Jul 18, 2024, 2:36 AM

#

Great question! One way to think about this is the difference between detecting is a word is not English (or whatever your target language), versus detecting what language that word actually is.

When you use an English speech to text model, it’s trying to predict English words that have been said. If other noises are present (including non English utterances), the model is smart enough to say “okay, that isn’t an English word, I’m not going to predict anything for that”. Hence, non English audio is just ignored.

That turns out to be a very different problem under the hood than the question “given a live stream of words, deterministically tell me what language each word is, and then transcribe it in that language”. All of a sudden, the domain space is much larger, and the tradeoff between latency, accuracy, and performance becomes tricker.

With that said, we do have very active research in this area, and there are some solutions available. For example, our new Spanglish model can handle English and Spanish in the same audio stream! https://deepgram.com/changelog/nova-2-spanglish

Deepgram

Nova-2 Spanglish | Deepgram

Deepgram Automatic Speech Recognition helps you build voice applications with better, faster, more economical transcription at scale.

teal shale Jul 19, 2024, 7:34 PM

#

Thanks for the response! Super helpful. We are noticing that sometimes non-english (or whichever language) speech will get transcribed by the streaming API, but will be more jibberish. Is it a safe assumption that the confidence level of the transcribed text would be lower in that case than when properly transcribing audio from the desired target language?

#How does the streaming API filter out other languages?