I need help with a diarization issue.
I’m using Deepgram STT with diarization enabled. For normal recorded audio files (e.g., WAV, MP3) the speaker separation works perfectly — Deepgram correctly identifies speaker 0, speaker 1, etc.
But when I generate audio by merging raw 16-bit PCM chunks (recorded via AudioWorklet in the browser) and then convert it into a single audio file, Deepgram fails to separate speakers. It returns only speaker: 0 for all utterances — both the user and the agent.
Details:
Raw PCM, 16-bit, mono, 16kHz
PCM chunks concatenated programmatically
Final file plays normally
Diarization = true
All segments are labeled speaker 0
Questions:
-
Does Deepgram require additional metadata (channel info, timestamps, etc.) for diarization to work properly on raw PCM merged audio?
-
Is there a recommended way to merge PCM chunks so diarization is preserved?
-
Should I convert PCM → WAV first and embed channel info?
-
Are there known limitations with diarization on mono PCM?