#very short audio content like 2 seconds doesn't give accurate word timestamps

1 messages · Page 1 of 1 (latest)

dull creek
#

Is this a documented problem or one off occurence?

pale snowBOT
#

Hi @dull creek. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Yes, there have been documented instances where the start and end timestamps of words in the transcription results from Deepgram's speech recognition models may not be accurate. This issue has been reported in various discussions on GitHub, where users have noticed discrepancies in the timestamps for both short and long audio content.

For example, in one discussion (https://github.com/orgs/deepgram/discussions/105), a user reported a possible bug with the start and end times of words. The user found that the timestamps for some words were either shortened or lengthened, causing the extracted audio for those words to be off.

In another discussion (https://github.com/orgs/deepgram/discussions/777), users reported that the start and end timestamps returned with streaming transcription were incorrect. The start timestamp was frequently found to be too early, and the start timestamp of the first word was occasionally wrong.

Deepgram's team has acknowledged these issues and has been working on fixes. In one case, they mentioned that a fix for the timestamp issue was in testing and would be released soon.

However, it's important to note that these issues may not be specific to short audio content. Factors such as the quality of the audio, background noise, and the specific words spoken can also affect the accuracy of the transcription and timestamps.

If you're experiencing issues with the accuracy of word timestamps for short audio content, it might be helpful to reach out to Deepgram's support or check their FAQ (https://deepgram.gitbook.io/help-center/faq/why-do-i-sometimes-get-different-results-with-the-same-audio) for more information.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@dull creek Kapa.ai is still learning and improving, please let me know how I did by reacting below.

dull creek
#

but i am curious only about short form like max 2 seconds audio clips

#

@pale snow but i am curious only about short form like max 2 seconds audio clips

pale snowBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

While the knowledge sources provided do not specifically address issues with short audio clips of around 2 seconds, it's important to note that the accuracy of speech recognition can be influenced by a variety of factors, not just the length of the audio clip. These can include the quality of the audio, the clarity of the speech, background noise, and the specific words spoken.

In the case of very short audio clips, one potential issue could be that the speech does not provide enough context for the model to accurately transcribe and assign timestamps. This is because automatic speech recognition models, like those used by Deepgram, often rely on the surrounding context to accurately recognize and transcribe speech.

If you're experiencing issues with the accuracy of word timestamps for short audio clips, it might be helpful to reach out to Deepgram's support or check their FAQ (https://deepgram.gitbook.io/help-center/faq/why-do-i-sometimes-get-different-results-with-the-same-audio) for more information. They may be able to provide more specific guidance or potential solutions for your use case.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: