#Why does deepgram holds the transcript?

1 messages · Page 1 of 1 (latest)

ember bramble
#

We don't want to use interim results and want to get the final transcript earlier. But deepgram holds our transcript.

Parameters for deepgram:
params = {
"model": "phonecall", # Specialized model for phone calls
"punctuate": "true",
"language": "en-US",
"encoding": "mulaw", # Twilio's audio encoding
"sample_rate": "8000", # Twilio's sample rate
"channels": "1", # Mono audio
"interim_results": "false", # Get partial results
"smart_format": "true"
}

Example: Why did it hold the last final transcript if it was spoek together?
2025-02-18 11:42:22 [INFO] [935feee0] Pause detected - No audio for 1101.709ms (server.py:297)
2025-02-18 11:42:22 [INFO] [935feee0] Pause detected - FINAL transcript: Loan care. Year end irs 1,098 tax forms will be mailed by January 31 and can now be viewed online by logging into your account at like the Servicing 1,098 forms will not be issued for interest paid amounts less than $600 This call may be monitored and recorded for quality purposes. In order to direct your call, please help me find your account. Press 1 to enter your loan number. Press 2 to enter your Social security number. Press 3 To your tax Id number. Press 4 (server.py:300)
2025-02-18 11:42:29 [INFO] [935feee0] Pause detected - No audio for 1156.783ms (server.py:297)
2025-02-18 11:42:29 [INFO] [935feee0] Pause detected - FINAL transcript: If you do not of this information, (server.py:300)

proven lichenBOT
#

Hi @ember bramble. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

latent nymphBOT
#

Hey there! This channel allows you to ask Kapa.ai for assistance. But it looks like you haven't connected your Discord profile to your Deepgram account yet. You can do so at https://community.deepgram.com - by doing so, you'll be able to ask questions in #1115960287183511643 if you need human assistance!

proven lichenBOT
#

Based on the information provided in the knowledge sources, I can explain why Deepgram appears to be holding the transcript in your case.

Deepgram's transcription process for live streaming audio is designed to balance accuracy with timely delivery of results. Even when interim results are disabled (as in your case with "interim_results": "false"), Deepgram still processes audio in chunks and may not immediately return a final transcript for several reasons:

  1. Accuracy optimization: Deepgram's algorithm waits for a certain amount of audio context before finalizing a transcript. As mentioned in the GitHub discussion about final results, final results are typically streamed once every 3-5 seconds. This delay allows the model to gather sufficient context to improve accuracy.

  2. Endpointing: Although not explicitly set in your parameters, Deepgram uses endpointing by default. This feature looks for natural pauses in speech to determine when to return a finalized transcript. As explained in the Deepgram documentation on endpointing, the default setting returns transcripts after detecting 10 milliseconds of silence.

  3. Voice Activity Detection (VAD): Deepgram uses a Voice Activity Detector to determine when speech has ended. This is not based on pure silence but on the detection of voice activity. As mentioned in a GitHub discussion about endpointing, the VAD-based endpointing algorithm works off the raw audio, which can sometimes lead to unexpected behavior, especially with noisy connections like Twilio.

#

In your specific example, where you're seeing pauses detected and final transcripts held, it's likely due to a combination of these factors. The system is waiting for natural pauses in speech (endpointing) and gathering sufficient audio context to optimize accuracy before returning a final transcript.

Unfortunately, based on the current knowledge sources, there isn't a direct way to force Deepgram to return final results faster without potentially sacrificing accuracy. The Deepgram team has acknowledged that this is "just how Deepgram works" for now, as stated in the GitHub discussion about getting final results faster.

If you absolutely need faster results and can tolerate potentially lower accuracy, you might consider experimenting with interim results and implementing your own logic to determine when to consider a result "final" based on your specific use case requirements.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: