Deepgram’s utt_split is based on detected pauses in speech, not on a fixed maximum utterance length. That means:
utterances=true must be set, or utt_split has no effect at all.[Improvements audit; API params]
utt_split is “seconds to wait before detecting a pause between words in submitted audio” — if the speaker doesn’t actually pause by at least that amount, the utterance will not be split, no matter how long the sentence is.[API params]
From the GitHub discussion you’re essentially running into the same behavior others have seen:
- Decreasing
utt_split below the default (0.8s) often has little or no effect on the number of utterances. In a 96‑second test file, values 0.01, 0.1, 0.2, and 0.5 all produced the same number of utterances; only increasing it (1, 1.5, 2, 3) noticeably changed segmentation.[GitHub utt_split discussion]
- The feature is “best suited for especially fast or slow speakers,” and you’ll “notice more variation of fewer longer utterances when increasing
utt_split, but less so of more shorter utterances when decreasing it.”[GitHub utt_split discussion]
So if your speaker has long stretches of continuous speech with very short or no pauses, even utt_split=0.1 won’t force shorter segments; there simply aren’t pauses long enough to trigger splits.
What you can do
Within the limits of the current feature set (based on the docs and discussions):
-
Confirm your request options
Make sure you’re actually sending both: