#Strange pause when trying to convert TTS -- seems to happen at random times around ellipses

1 messages · Page 1 of 1 (latest)

round mantle
#

Not sure why this happens. But I get this weird pause and clicking sometimes (maybe 3% of the time) when I try to convert my text to speech around ellipses, esp. phrases like "...you know..." etc.

Can you give me some feedback on my code and tell me if I'm doing anything wrong?
// Add 'container: none' to prevent header issues that cause distorted/creepy sounds
// This prevents header information from being misinterpreted as audio data

const response = await deepgram.speak.request(
  { text: processedText }, 
  { 
    model: voiceModel,
    container: 'none' // Prevent header misinterpretation that can cause creepy sounds
  }
);

const stream = await response.getStream();
const chunks: Uint8Array[] = [];

const reader = stream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  chunks.push(value);
}

// Combine all chunks into a single Uint8Array
const rawAudioData = chunks.reduce(
  (acc, chunk) => {
    const newArray = new Uint8Array(acc.length + chunk.length);
    newArray.set(acc);
    newArray.set(chunk, acc.length);
    return newArray;
  },
  new Uint8Array(0)
);

// Create a proper WAV container for the audio data
// Since we requested container: 'none', we need to manually add WAV headers
// for proper playback in browsers and other audio players
const audioWithHeader = addWavHeader(rawAudioData);

await logToAudit(medplum, taskId, 'Audio generation complete', {
  rawAudioSize: rawAudioData.length,
  finalAudioSize: audioWithHeader.length,
  wasTruncated,
  usingContainer: 'WAV (manually added)'
});

return audioWithHeader;

} catch (error) {
await logToAudit(medplum, taskId, 'Error generating audio', {
error: (error as Error).message
});
throw new Error(Failed to generate audio: ${(error as Error).message});
}
}

compact adderBOT
#

Hi @round mantle. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Based on your code and the issue you're experiencing with pauses and clicking sounds around ellipses in your text-to-speech conversions, I can provide some insights.

The Clicking Issue

The clicking sound you're experiencing is a known issue with Deepgram's TTS. You've already implemented the recommended solution by setting container: 'none' in your request options, which is good. This prevents header information from being misinterpreted as audio data.

According to the documentation:

If you encounter a problem of hearing "clicks" in the audio, we recommend that you add container=none to the request. FAQ & Troubleshooting

Pauses Around Ellipses

The pauses you're experiencing around ellipses are actually an intended feature of Deepgram's TTS. Ellipses (...) are specifically used to create natural pauses in the speech:

If you need to insert a longer pause in your audio, use the ellipsis: .... To include longer pauses, you can have insert more dots in groups of three (eg. 6 dots). Text to Speech Prompting

There's also an unofficial way to control speech speed mentioned in a Deepgram discussion:

One way to slow down the speech is to use ellipses. It isn't a stable feature, so may (and probably will) break in future versions. But for now (until official speed control is released) you can add "…" in between sections where you need to insert a pause. Deepgram Discussion

Silent Pauses vs. Natural Pauses

If you want more control over the type of pause, Deepgram offers two types:

  1. Natural Pauses (using ...): These can sometimes be a long silent pause, a breath, or an elongation of words.
#
  1. Silent Pauses (using . . . with spaces): These have a higher probability of staying silent during the pause duration.
"To confirm, is your registration number BY. . 3984. . 0297?"

This might help with the inconsistency you're experiencing.

Code Review

Your code looks good overall. You're:

  1. Setting container: 'none' correctly
  2. Properly reading the stream
  3. Adding a WAV header manually for proper playback

One thing to note is that the documentation mentions that silent pauses are "only available as a prompting work around for our current Aura model and may not be available in our future models." Text to Speech Prompting

If you're still experiencing issues, you might want to try analyzing your audio with FFmpeg as suggested in the documentation to better understand what's happening with the audio signal at those points where you hear clicks or unusual pauses.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: