I am using the nova-2 STT model. Should you use pre-recorded audio for 30sec segments that have just been recorded and are supposed transcribed right after to save costs. Or should you instead use the livestreaming API instead? How long does the Pre-recorded Audio have to be, would you really save any costs and how much faster and/or less accurate is the Livestream Version?
#Pre-recorded vs. Livestream
1 messages · Page 1 of 1 (latest)
Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently.
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
- The programming language you are working in (e.g. JavaScript, Python).
- The deepgram product you are using (e.g Speech to Text, Agent API)
- A request ID that triggered your error or issue.
STT nova-2
Accuracy: Pre-recorded audio transcription is slightly more accurate than streaming. According to Deepgram's benchmarks, pre-recorded transcription has an 8.4% word error rate (WER), while streaming has a 10.7% WER for English. This difference is modest but might be significant depending on your use case.
Speed: For 30-second segments, the speed difference between pre-recorded and streaming is likely to be minimal. Deepgram's nova-2 model is designed for high-speed transcription in both modes Models & Languages Overview.
Cost: Deepgram bills per second of audio, not rounded up to the nearest minute. This means there's no inherent cost advantage to using pre-recorded over streaming for 30-second segments, or vice versa. Nova-2 Pre-recorded is $.0043/min while streaming is $.0059/min. See the pricing page: https://deepgram.com/pricing
Minimum duration: There is no minimum duration for pre-recorded audio. You can transcribe audio of any length, including 30-second segments. However, with some languages I do think that really short audio doesn't transcribe as well (but that's just based on what I've heard in the community)
Use case considerations: If you're transcribing segments immediately after recording, the livestreaming API might be more suitable for real-time applications.