#Simple webm file incorrectly outputting very short transcript

1 messages · Page 1 of 1 (latest)

clever turret
#

Hiya - for this webm file when i transcribe with code like this:

options = PrerecordedOptions(smart_format=True, model="nova-3", diarize=True)

response = deepgram.listen.rest.v("1").transcribe_url(
    {"url": presigned_url}, options, timeout=httpx.Timeout(300.0, connect=10.0)
)

It outputs this very short transcript
[
{
"speaker": "Speaker 0",
"text": "okay that's recording so what should we talk about should we talk about our weekends yes so what did you come to so on the weekend yeah",
"start_time": 0.88,
"end_time": 210.50493
}
]

which is cutting off a big chunk of the audio. I've attached the file in question.
Many thanks in advance for the help!

slow grailBOT
#

Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently.
-# If you haven't done so, ensure your Discord and Github profiles are linked to Deepgram so you can earn points to redeem on cool stuff just by being active!

paper falcon
#

It looks like you are using our JS SDK for this request

  1. The connect=10.0 timeout is part of the httpx client configuration and specifically controls how long the client will wait to establish the initial connection to the Deepgram API. This is separate from the total request timeout of 300 seconds.

  2. Regarding the transcript being cut off at 210.50493 seconds, there are a few potential issues:

    a) The presigned URL might be expiring or having access issues. When using presigned URLs, they typically have a limited lifetime and might expire during the transcription process.
    b) The audio file might be getting cut off during download. The SDK makes a POST request to the Deepgram API with the URL, and Deepgram then attempts to download the audio file. If there are any issues with the download, it could result in partial transcription.

Here are my recommendations:

  1. Next Step: Try using a direct file upload instead of a presigned URL to rule out URL expiration issues:
  2. Future Steps:
    • If using a presigned URL is necessary, ensure it has a long enough expiration time (e.g., 1 hour)
    • Consider using the async callback endpoint if the file is large, which can handle longer processing times
    • Verify the audio file is complete and accessible by testing it with a direct download
    • Monitor the network connection during the transcription process to ensure it's stable