#Diarization doesn't work

1 messages · Page 1 of 1 (latest)

tender pulsarBOT
#

Hi @merry igloo. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

faint cedarBOT
#

Hey there! This channel allows you to ask Kapa.ai for assistance. But it looks like you haven't connected your Discord profile to your Deepgram account yet. You can do so at https://community.deepgram.com - by doing so, you'll be able to ask questions in #1115960287183511643 if you need human assistance!

tender pulsarBOT
#

It sounds like you're experiencing an issue where Deepgram's diarization is not distinguishing between your AI agent and human speakers, even though their voices are very different. This is a known challenge and has been reported by other users as well. Here are some key points and troubleshooting steps based on Deepgram's documentation and community discussions:

1. Audio Quality and Format Matter

Diarization accuracy can be significantly affected by audio quality, encoding, and the way the audio is captured. If both speakers are on the same channel and the audio is low quality or compressed, the model may struggle to differentiate between them—even if their voices sound distinct to a human ear. Try to ensure your audio is high quality and, if possible, that each speaker is recorded on a separate channel (see the MultiChannel feature below) Speaker diarization not working.

2. Model Selection

You are using the nova-2-phonecall model, which is generally appropriate for phone call audio. However, diarization performance can vary between models and sometimes even between requests. If diarization is not working as expected, try switching to other models like nova-2-general or nova-2-meeting to see if results improve Diarization not working.

3. Known Limitations and Current Issues

There have been recent reports that diarization is not working reliably on some Nova-2 models, and Deepgram's engineering team is aware of these issues and investigating them. If you are consistently seeing only one speaker identified, it may be due to these ongoing issues Diarization not working on some Nova-2 models.

#

4. MultiChannel Audio as a Solution

If your use case allows, consider using the MultiChannel feature, where each speaker is recorded on a separate audio channel. This provides perfect speaker separation, as each channel is treated as a single speaker. This is especially effective for phone calls or virtual meetings where each participant can be isolated to their own channel Docs: Multichannel vs Diarization.

5. Short Audio and Speaker Differentiation

For short audio files, diarization may not have enough data to build distinct speaker profiles. Deepgram recommends prepending a 30-second sample of the primary speaker's voice (with matching encoding) to help the model differentiate speakers more reliably Improving Diarization by Prepending Audio.

6. No User-Side Tuning

Currently, Deepgram does not offer a "temperature" or similar parameter to adjust diarization sensitivity. The results are non-deterministic and can vary between runs, even with the same settings Diarization Results are different from using API and Playground.


Summary of Recommendations:

  • Try different models (nova-2-general, nova-2-meeting).
  • Ensure your audio is high quality and, if possible, use multi-channel audio.
  • For short files, prepend a 30-second sample of the primary speaker.
  • Be aware that diarization issues are known and being investigated by Deepgram.
#

If you continue to have issues, consider providing a request ID to Deepgram support so they can investigate further, as diarization performance can sometimes be affected by backend issues Diarization not working on some Nova-2 models.

If you need more specific troubleshooting, please share details such as the audio format, length, and a sample (if possible).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: