STT utterance issue with front-end with next.js | Deepgram | Page 1

frigid gyro Mar 21, 2025, 12:07 PM

#

I'm using STT with a Next.js frontend and Twilio for my AI agent. Everything works fine in Twilio, but when running the same code with my frontend, the final utterance transcript slows down over time—initially fast, then gradually slower, and after 3-4 conversations, it stops generating utterance at all. Interim transcripts work fine. The backend is managed with Wrangler on Cloudflare, handling the full conversation while the frontend sends audio data for stt

Deepgram stt configuration is like below

  const deepgram: any = deepgramClient.listen.live({
    model: 'nova-2-general',
    language: languageCode,
    smart_format: true,
    encoding: sttVoiceConfig[voiceCallStates.phoneNumberProviderService].encoding,
    sample_rate: sttVoiceConfig[voiceCallStates.phoneNumberProviderService].sampleRate,
    channels: 1,
    multichannel: false,
    no_delay: true,
    interim_results: true,
    endpointing: 300,
    utterance_end_ms: 1000,
    // sentiment: true,
  });

pallid stormBOT Mar 21, 2025, 12:07 PM

#

Hi @frigid gyro. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

split willowBOT Mar 21, 2025, 12:07 PM

#

Hey there! This channel allows you to ask Kapa.ai for assistance. But it looks like you haven't connected your Discord profile to your Deepgram account yet. You can do so at https://community.deepgram.com - by doing so, you'll be able to ask questions in #1115960287183511643 if you need human assistance!

Deepgram Community

Deepgram’s community offers tons of resources to support you on your voice AI journey

pallid stormBOT Mar 21, 2025, 12:08 PM

#

I can help with your Deepgram STT utterance issue in your Next.js frontend application.

Based on the knowledge sources, your issue with slowing utterance transcripts and eventual failure appears to be related to WebSocket connection management. Several Deepgram users have experienced similar issues with real-time transcription.

The problem might be occurring because:

WebSocket connection management: Your frontend might not be properly closing or managing WebSocket connections between conversations, leading to resource exhaustion over time Link to discussion about WebSocket issues.
Utterance end detection: The configuration you're using for endpointing (300ms) and utterance_end_ms (1000ms) might be causing issues with detecting the end of speech properly, especially if there's background noise in your audio stream Link to endpointing discussion.
Event handling: You might not be properly handling all the necessary events from the Deepgram WebSocket connection, particularly the UtteranceEnd event Link to UtteranceEnd issue.

Here are some recommendations to fix your issue:

Ensure proper WebSocket cleanup: Make sure you're properly closing the Deepgram connection after each conversation and removing all event listeners:

// When conversation ends
deepgram.removeAllListeners();
deepgram.finish();

Adjust endpointing settings: For better performance with potentially noisy audio:

#

const deepgram = deepgramClient.listen.live({
  // Your other settings...
  endpointing: 200,        // Shorter endpointing time
  utterance_end_ms: 2000,  // Longer utterance end time
  vad_events: true         // Add this to get voice activity detection events
});

Add proper event handling for all Deepgram events, especially for connection issues:

deepgram.on(LiveTranscriptionEvents.Close, () => {
  console.log("Connection closed");
  // Clean up resources here
});

deepgram.on(LiveTranscriptionEvents.Error, (error) => {
  console.error("Deepgram error:", error);
  // Handle error appropriately
});

Consider adding a keepalive mechanism if your connections are staying open for long periods:

setInterval(() => {
  if (deepgram && deepgram.readyState === 1) { // Check if connection is open
    deepgram.send(JSON.stringify({ type: "KeepAlive" }));
  }
}, 3000); // Send keepalive every 3 seconds

The gradual slowdown suggests a resource leak or accumulation issue, which is common in applications that create new WebSocket connections without properly cleaning up old ones.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@frigid gyro Kapa.ai is still learning and improving, please let me know how I did by reacting below.

#STT utterance issue with front-end with next.js

The gradual slowdown suggests a resource leak or accumulation issue, which is common in applications that create new WebSocket connections without properly cleaning up old ones.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: