Deepgram Live Diarization | Deepgram | Page 1

desert idol Feb 27, 2025, 4:54 AM

#

Using deepgram live transcription (latest version) and react how do I get a spoken line and the speaker for each utterance?

keen wigeonBOT Feb 27, 2025, 4:54 AM

#

Hi @desert idol. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

To get a spoken line and the speaker for each utterance using Deepgram live transcription with React, you'll need to set up a live transcription stream and handle the incoming transcripts. While the knowledge sources don't provide a specific React example, I can guide you based on the available information about Deepgram's live transcription and diarization features.

Here's a general approach you can follow:

Set up Deepgram live transcription with diarization enabled:

When initializing your Deepgram live transcription, make sure to include the diarize option in your configuration. Here's an example of how you might set up the options:

const deepgramOptions = {
  punctuate: true,
  model: "nova-2",
  diarize: true,
  interim_results: true,
  utterance_end_ms: "1000",
  vad_events: true
};

Handle the incoming transcripts:

In your React component, you'll need to set up a listener for the transcripts. Each time a transcript is received, you can extract the spoken line and the speaker information. Here's a simplified example of how you might handle this:

deepgramLive.addListener('transcript', (transcript) => {
  if (transcript.is_final) {
    const words = transcript.channel.alternatives[0].words;
    let currentSpeaker = null;
    let currentUtterance = '';

    words.forEach((word) => {
      if (currentSpeaker !== word.speaker) {
        if (currentUtterance) {
          // Output the previous utterance
          console.log(`Speaker ${currentSpeaker}: ${currentUtterance.trim()}`);
          // Here you would update your React state to display this utterance
        }
        currentSpeaker = word.speaker;
        currentUtterance = '';
      }
      currentUtterance += word.punctuated_word + ' ';
    });

#

// Output the last utterance
if (currentUtterance) {
console.log(Speaker ${currentSpeaker}: ${currentUtterance.trim()});
// Here you would update your React state to display this utterance
}
}
});


This code snippet is based on the approach shown in the [How to have speaker wise transcript](<https://github.com/orgs/deepgram/discussions/129>) discussion, adapted for live transcription.

3. Update your React state:

Instead of logging to the console, you would update your React component's state with the new utterances. This could be done by maintaining an array of utterances, each with a speaker and text property.

4. Render the utterances:

In your React component's render method, you would map over the array of utterances to display them, showing the speaker and their spoken line for each utterance.

Please note that the exact implementation details may vary depending on the specific version of the Deepgram SDK you're using and how you've set up your React application. You may need to adjust the code to fit your specific setup and requirements.

For the most up-to-date and React-specific implementation details, I recommend checking the official Deepgram documentation or reaching out to Deepgram support for React-specific examples.

[Deepgram Discussions](<https://github.com/orgs/deepgram/discussions/129>) provides some insights into handling speaker-wise transcripts, which you can adapt for your React application.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

#

@desert idol Kapa.ai is still learning and improving, please let me know how I did by reacting below.

desert idol Feb 27, 2025, 5:05 AM

#

@keen wigeon
socket.on("transcript", (transcript) => {
if (transcript.is_final) {
const words = transcript.channel.alternatives[0].words;
let currentSpeaker: string | null = null;
let currentUtterance = "";

      words.forEach((word: any) => {
        if (currentSpeaker !== word.speaker) {
          if (currentUtterance) {
            // Output the previous speaker's utterance
            console.log(
              `Speaker ${currentSpeaker}: ${currentUtterance.trim()}`
            );
            // Update React state here if needed
          }
          currentSpeaker = word.speaker;
          currentUtterance = "";
        }
        currentUtterance += word.punctuated_word + " ";
      });

      // Output the last utterance
      if (currentUtterance) {
        console.log(
          `Speaker ${currentSpeaker}: ${currentUtterance.trim()}`
        );
        // Update React state here if needed
      }
    }
  });

How can I make it such that when a speaker speaks without a new one interjecting, the utterance grows. And when a new speaker comes in, the utterance empties and a new one starts.

keen wigeonBOT Feb 27, 2025, 5:05 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

To modify the code so that utterances grow for a single speaker and reset when a new speaker starts, you can make a few adjustments to your existing code. Here's how you can achieve this:

let currentSpeaker: string | null = null;
let currentUtterance = "";

socket.on("transcript", (transcript) => {
  if (transcript.is_final) {
    const words = transcript.channel.alternatives[0].words;

    words.forEach((word: any) => {
      if (currentSpeaker !== word.speaker) {
        if (currentUtterance) {
          // Output the previous speaker's utterance
          console.log(`Speaker ${currentSpeaker}: ${currentUtterance.trim()}`);
          // Update React state here if needed
        }
        currentSpeaker = word.speaker;
        currentUtterance = ""; // Reset utterance for new speaker
      }
      currentUtterance += word.punctuated_word + " ";
    });

    // Output the current utterance (it will continue to grow for the same speaker)
    console.log(`Speaker ${currentSpeaker}: ${currentUtterance.trim()}`);
    // Update React state here if needed
  }
});

The key changes in this code are:

Move currentSpeaker and currentUtterance outside the event listener. This allows these variables to persist across multiple transcript events.
Remove the final output inside the forEach loop. We only want to output when a new speaker starts or at the end of processing all words.
Always output the current utterance at the end of processing all words. This ensures that the growing utterance for the current speaker is always displayed.

This approach will allow the utterance to grow as long as the same speaker continues talking. When a new speaker starts, it will output the previous speaker's complete utterance and start a new one for the new speaker.

#

To integrate this with React, you would typically update your component's state instead of using console.log. For example:

const [utterances, setUtterances] = useState<Array<{speaker: string, text: string}>>([]);

// Inside your event listener:
setUtterances(prevUtterances => [
  ...prevUtterances,
  { speaker: currentSpeaker, text: currentUtterance.trim() }
]);

Remember to handle the initial case where currentSpeaker is null, and consider how you want to update the UI when utterances grow (you might want to update the last utterance instead of adding a new one if the speaker hasn't changed).

This approach is based on the principles discussed in the How to have speaker wise transcript GitHub discussion, adapted for your specific use case of growing utterances per speaker.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@desert idol Kapa.ai is still learning and improving, please let me know how I did by reacting below.

desert idol Feb 27, 2025, 5:11 AM

#

@keen wigeon
Speaker: 0, Text: Alright. Let's do this.
Speaker: 0, Text: Alright. Let's do this. Okay. How does this look now?
Speaker: 0, Text: Alright. Let's do this. Okay. How does this look now? Wait. What? No. That's not what I want.
Speaker: 0, Text: Alright. Let's do this. Okay. How does this look now? Wait. What? No. That's not what I want. Not even close.

not what I wanted. I want just one line with speaker 0 and "Alright. Let's do this. Okay. How does this look now? Wait. What? No. That's not what I want. Not even close." Not all the previous versions

keen wigeonBOT Feb 27, 2025, 5:11 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

I understand your concern. You want a single line for Speaker 0 with the complete utterance, rather than multiple lines showing the progression of the utterance. Let's modify the approach to achieve what you're looking for.

Based on the knowledge from the How to have speaker wise transcript discussion, we can adapt the code to accumulate the full utterance for each speaker and only output or update the state when there's a speaker change or at the end of the transcript. Here's how you can modify your code:

let currentSpeaker: string | null = null;
let currentUtterance = "";

socket.on("transcript", (transcript) => {
  if (transcript.is_final) {
    const words = transcript.channel.alternatives[0].words;

    words.forEach((word: any) => {
      if (currentSpeaker !== word.speaker) {
        if (currentUtterance) {
          // Output the complete utterance for the previous speaker
          console.log(`Speaker ${currentSpeaker}: ${currentUtterance.trim()}`);
          // Update React state here if needed
        }
        currentSpeaker = word.speaker;
        currentUtterance = "";
      }
      currentUtterance += word.punctuated_word + " ";
    });

    // Output the final utterance for the current speaker
    if (currentUtterance) {
      console.log(`Speaker ${currentSpeaker}: ${currentUtterance.trim()}`);
      // Update React state here if needed
    }
  }
});

In this modified version:

We keep track of the currentSpeaker and currentUtterance outside the event listener to maintain state across transcript events.
We only output (or update React state) when there's a speaker change or at the end of processing all words.
The currentUtterance accumulates all words for the current speaker, giving you the complete utterance.

#

To integrate this with React, you could update your state like this:

const [utterances, setUtterances] = useState<Array<{speaker: string, text: string}>>([]);

// Inside your event listener, replace console.log with:
setUtterances(prevUtterances => {
  const newUtterances = [...prevUtterances];
  const existingUtteranceIndex = newUtterances.findIndex(u => u.speaker === currentSpeaker);
  
  if (existingUtteranceIndex !== -1) {
    // Update existing utterance for this speaker
    newUtterances[existingUtteranceIndex].text = currentUtterance.trim();
  } else {
    // Add new utterance for this speaker
    newUtterances.push({ speaker: currentSpeaker, text: currentUtterance.trim() });
  }
  
  return newUtterances;
});

This approach will give you a single, continuously updated line for each speaker, containing their full utterance. The output or React state will only show the most recent, complete version of each speaker's utterance, not the intermediate steps.

Remember to handle the initial case where `currentSpeaker` is null, and consider how you want to manage very long utterances or multiple turns by the same speaker in your UI design.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@desert idol Kapa.ai is still learning and improving, please let me know how I did by reacting below.

desert idol Feb 27, 2025, 4:48 PM

#

@keen wigeon
if (data) {
// Update token in Redux or local state
dispatch(setCredentials(data));

          // Disconnect the old socket
          socket.disconnect();

          // Create a new socket connection with the refreshed token
          const newSocket = io(import.meta.env.VITE_API_URL, {
            auth: {
              token: data.accessToken,
            },
          });

          // Update the socket reference
          socketRef.current = newSocket;

          console.log("Reconnected to socket with refreshed token");

          // Reinitialize the MediaRecorder with the new socket
          if (mediaRecorderRef.current) {
            mediaRecorderRef.current.stop(); // Stop the old MediaRecorder
          }

          // Access the microphone again
          const stream = await navigator.mediaDevices.getUserMedia({
            audio: true,
          });
          mediaRecorderRef.current = new MediaRecorder(stream);

          mediaRecorderRef.current.ondataavailable = (event: BlobEvent) => {
            if (event.data.size > 0 && socketRef.current) {
              socketRef.current.emit("audioData", event.data);
            }
          };

          // Start the MediaRecorder
          mediaRecorderRef.current.start(250); // Send data every 250ms
          setIsRecording(true);
        }
      } catch (error) {
        console.log("Token refresh error:", error);
        dispatch(removeCredentials());
      }
    }
  });

after getting a refresh token I want to create a new socket and record from there, how do I fix this because it won't work

keen wigeonBOT Feb 27, 2025, 4:48 PM

#

Let me check the knowledge base for you again. One moment! :books:

#

Based on the provided knowledge sources and the context of your question, I don't have specific information about handling token refresh and socket reconnection in the context of Deepgram's live transcription service. However, I can provide some general guidance based on the code you've shared and common practices for socket reconnection.

The approach you're taking seems reasonable, but there might be a few things to consider:

Deepgram WebSocket Connection:
Your code is creating a new socket connection using Socket.IO, but for Deepgram, you typically need to create a WebSocket connection specifically for Deepgram's API. The Deepgram WebSocket connection is usually created like this:
```
const ws = new WebSocket(`wss://api.deepgram.com/v1/listen?encoding=...&sample_rate=...`, [
  'token',
  NEW_ACCESS_TOKEN
]);
```

Reinitializing Deepgram Connection:
After getting the new token, you should reinitialize the Deepgram WebSocket connection instead of a generic Socket.IO connection. This might look something like:

if (data) {
  // Update token in Redux or local state
  dispatch(setCredentials(data));

  // Close the old WebSocket if it exists
  if (socketRef.current) {
    socketRef.current.close();
  }

  // Create a new Deepgram WebSocket connection with the refreshed token
  const newSocket = new WebSocket(`wss://api.deepgram.com/v1/listen?encoding=...&sample_rate=...`, [
    'token',
    data.accessToken
  ]);

  // Update the socket reference
  socketRef.current = newSocket;

  console.log("Reconnected to Deepgram with refreshed token");

#

// Set up event listeners for the new WebSocket
newSocket.onopen = () => {
console.log("Deepgram WebSocket connection opened");
// Start sending audio data here
};

 newSocket.onmessage = (event) => {
   // Handle incoming transcripts
 };

 // Reinitialize the MediaRecorder
 // ... (your existing MediaRecorder code)

}


3. Error Handling:
Make sure to add appropriate error handling for the WebSocket connection:

```javascript
newSocket.onerror = (error) => {
  console.error("WebSocket error:", error);
};

newSocket.onclose = (event) => {
  console.log("WebSocket closed:", event.code, event.reason);
};

Sending Audio Data:
Ensure that you're sending the audio data in the correct format expected by Deepgram. The Deepgram WebSocket connection example shows:

mediaRecorder.addEventListener('dataavailable', event => {
  if (event.data.size > 0 && ws.readyState == 1) {
    ws.send(event.data);
  }
});

Remember to adjust the WebSocket URL parameters (encoding, sample rate, etc.) according to your specific requirements.

If you're still experiencing issues after making these adjustments, it would be helpful to check the console for any error messages and ensure that your new access token is valid. If problems persist, you may want to reach out to Deepgram's support for more specific guidance on handling token refresh with their WebSocket API.

Deepgram Discussions provides some insights into setting up WebSocket connections, which you can adapt for your token refresh scenario.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@desert idol Kapa.ai is still learning and improving, please let me know how I did by reacting below.

desert idol Feb 27, 2025, 9:03 PM

#

@keen wigeon what is the cost of using keep alive vs sending audio to the socket?

keen wigeonBOT Feb 27, 2025, 9:03 PM

#

Let me check the knowledge base for you again. One moment! :books:

urban fernBOT Feb 27, 2025, 9:03 PM

#

Hey there, it looks like you've been talking with Kapa for a bit. If you need additional help, you should consider moving to #1115960287183511643 where our team can answer your query.

keen wigeonBOT Feb 27, 2025, 9:03 PM

#

The cost of using the KeepAlive feature versus sending audio to the socket with Deepgram is addressed in the knowledge sources. Here's a summary of how the costs are structured:

Using KeepAlive:
- If you stop sending audio data and use the KeepAlive message, Deepgram will keep the WebSocket connection open without charging for the time when no audio data is being sent. This allows you to maintain the connection without incurring additional costs for silence.
Sending Audio Data:
- When you send audio data to Deepgram, you are charged for the transcription of that audio. This includes any audio data sent, whether it contains speech, silence, or noise.
Silent Audio:
- If you send silent audio (audio that contains no speech) to Deepgram, you will be charged for the transcription of that silent audio. Deepgram processes the audio to determine its content, and charges apply once the transcription process is invoked.

For more detailed information, you can refer to the discussion on Pricing: Do you charge for audios totally in silence or audios with only noise?, which explains the cost implications of using KeepAlive and sending audio data.

In summary, using `KeepAlive` is a cost-effective way to keep the connection open without incurring charges for periods of silence, whereas sending audio data will result in charges based on the transcription of that audio.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@desert idol Kapa.ai is still learning and improving, please let me know how I did by reacting below.

desert idol Feb 28, 2025, 12:58 AM

#

@keen wigeon in my deepgram socket how do I do a keepalive?

keen wigeonBOT Feb 28, 2025, 12:58 AM

#

Let me check the knowledge base for you again. One moment! :books:

#

To implement a keepalive mechanism in your Deepgram WebSocket connection, you need to send a specific keepalive message to the server at regular intervals. Here's how you can do it:

Send a keepalive message:
To send a keepalive message, you need to send a JSON object with the type "KeepAlive" to the WebSocket. Here's an example of how to do this:
```
ws.send(JSON.stringify({ type: "KeepAlive" }));
```

Set up an interval to send keepalive messages:
You should send these keepalive messages every 5-10 seconds to ensure the connection stays open. Here's an example of how to set this up:

const keepAliveInterval = setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: "KeepAlive" }));
  }
}, 5000); // Send keepalive every 5 seconds

Clear the interval when closing the connection:
Make sure to clear the interval when you're done with the connection to avoid sending keepalive messages after the connection is closed:
```
ws.onclose = () => {
  clearInterval(keepAliveInterval);
};
```

Here's a more complete example incorporating these steps:

const ws = new WebSocket(`wss://api.deepgram.com/v1/listen?encoding=...&sample_rate=...`, [
  'token',
  YOUR_ACCESS_TOKEN
]);

ws.onopen = () => {
  console.log("Deepgram WebSocket connection opened");
  
  // Set up keepalive interval
  const keepAliveInterval = setInterval(() => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(JSON.stringify({ type: "KeepAlive" }));
    }
  }, 5000); // Send keepalive every 5 seconds

  // Clear interval when connection closes
  ws.onclose = () => {
    clearInterval(keepAliveInterval);
    console.log("Deepgram WebSocket connection closed");
  };
};

// Other event handlers...

#

This approach is based on the information provided in the Sudden Deepgram Failure discussion, which mentions the use of KeepAlive messages to maintain the connection.

Remember, using KeepAlive is a cost-effective way to maintain the connection during periods of silence without incurring charges, as mentioned in the Pricing discussion.

If you're using a specific Deepgram SDK, it might have built-in support for keepalive messages. For example, some SDKs might have an `EnableKeepAlive` option that you can set to `true` when initializing the connection. Always refer to the documentation for your specific SDK version for the most accurate implementation details.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@desert idol Kapa.ai is still learning and improving, please let me know how I did by reacting below.

desert idol Feb 28, 2025, 2:47 PM

#

@keen wigeon
dgConnection.on(LiveTranscriptionEvents.Open, () => {
console.log("Deepgram connection opened");

  dgConnection.on(LiveTranscriptionEvents.Transcript, (data) => {
    socket.emit("transcript", data);
  });

  socket.on("audioData", (audioChunk) => {
    if (dgConnection.getReadyState() === 1) {
      dgConnection.send(audioChunk);
    }
  });
});

dgConnection.on(LiveTranscriptionEvents.Close, () => {
  console.log("Deepgram connection closed");
});

dgConnection.on(LiveTranscriptionEvents.Error, (error) => {
  console.error("Deepgram error:", error);
});

socket.on("disconnect", () => {
  console.log(`Client disconnected: ${socket.id}`);
  dgConnection.requestClose();
});

});

This is my deepgram socketio socket, add keepalive here

keen wigeonBOT Feb 28, 2025, 2:47 PM

#

Let me check the knowledge base for you again. One moment! :books:

#

Certainly! I can help you add the keepalive functionality to your Deepgram WebSocket connection using Socket.IO. Based on the information from the Sudden Deepgram Failure discussion and the Keep connection alive thread, here's how you can modify your code to include keepalive messages:

dgConnection.on(LiveTranscriptionEvents.Open, () => {
  console.log("Deepgram connection opened");

  dgConnection.on(LiveTranscriptionEvents.Transcript, (data) => {
    socket.emit("transcript", data);
  });

  socket.on("audioData", (audioChunk) => {
    if (dgConnection.getReadyState() === 1) {
      dgConnection.send(audioChunk);
    }
  });

  // Set up keepalive interval
  const keepAliveInterval = setInterval(() => {
    if (dgConnection.getReadyState() === 1) {
      dgConnection.keepAlive();
      console.log("Sent keepalive to Deepgram");
    }
  }, 10000); // Send keepalive every 10 seconds

  // Clear interval when connection closes
  dgConnection.on(LiveTranscriptionEvents.Close, () => {
    clearInterval(keepAliveInterval);
    console.log("Deepgram connection closed");
  });
});

dgConnection.on(LiveTranscriptionEvents.Error, (error) => {
  console.error("Deepgram error:", error);
});

socket.on("disconnect", () => {
  console.log(`Client disconnected: ${socket.id}`);
  dgConnection.requestClose();
});

In this modified version:

We've added a keepAliveInterval that sends a keepalive message every 10 seconds using the dgConnection.keepAlive() method. This is based on the Deepgram SDK's built-in keepalive functionality, as mentioned in the Sudden Deepgram Failure discussion.
The interval is set up when the Deepgram connection is opened.

#

We check if the connection is still open (getReadyState() === 1) before sending the keepalive message.
The interval is cleared when the Deepgram connection is closed to prevent sending keepalive messages after the connection ends.
We've added a console log to confirm when keepalive messages are sent, which can be useful for debugging.

This approach should help maintain the connection during periods of inactivity without incurring additional charges for silent audio, as explained in the Pricing discussion.

Remember to adjust the keepalive interval (currently set to 10 seconds) based on your specific needs and Deepgram's recommendations. The optimal interval may depend on your use case and network conditions.

Deepgram Discussions provides additional context on keeping connections alive, which informed this implementation.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@desert idol Kapa.ai is still learning and improving, please let me know how I did by reacting below.

desert idol Feb 28, 2025, 2:54 PM

#

@keen wigeon I want to mediarecorder.pause() and resume, how do I handle taht on my frontend and backend? Using react for the frontend

keen wigeonBOT Feb 28, 2025, 2:54 PM

#

Let me check the knowledge base for you again. One moment! :books:

#Deepgram Live Diarization

This approach is based on the principles discussed in the How to have speaker wise transcript GitHub discussion, adapted for your specific use case of growing utterances per speaker.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Remember to handle the initial case where currentSpeaker is null, and consider how you want to manage very long utterances or multiple turns by the same speaker in your UI design.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Deepgram Discussions provides some insights into setting up WebSocket connections, which you can adapt for your token refresh scenario.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

In summary, using KeepAlive is a cost-effective way to keep the connection open without incurring charges for periods of silence, whereas sending audio data will result in charges based on the transcription of that audio.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Deepgram Discussions provides additional context on keeping connections alive, which informed this implementation.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Remember to handle the initial case where `currentSpeaker` is null, and consider how you want to manage very long utterances or multiple turns by the same speaker in your UI design.

In summary, using `KeepAlive` is a cost-effective way to keep the connection open without incurring charges for periods of silence, whereas sending audio data will result in charges based on the transcription of that audio.