#Speaker Labeling Regression/Mismatch

1 messages · Page 1 of 1 (latest)

lethal cargo
#

Recently we have been experiencing what appears to be a significant regression in speaker diarization quality for prerecorded speech to text, especailly for nova-3.

Here are 3 requests of the same audios from varying sources:

001d5ab5-b650-46a8-8d35-e75bb32ad6e8: Nova-2 in playground. Only one that does diarization well

1e91aeab-93b8-459d-a12b-8b0c2a1ebb92: Nova-3 in the playground. Bad

12baa477-8cdf-4749-8b85-c8906a1d9a4d: Nova-2 via the API. Also bad

Hopefully this is just a temporary bug that can be acknowledged or resolved, as we have had multiple customer complaints of this issue in the last week or so. If not we will likely need to migrate off the platform

half ventureBOT
#

Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently.
-# If you haven't done so, ensure your Discord and Github profiles are linked to Deepgram so you can earn points to redeem on cool stuff just by being active!

#

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

  • The programming language you are working in (e.g. JavaScript, Python).
lethal cargo
#

This is javascript although thats not relevant

kindred birchBOT
#

Thank you for the feedback. Can you be more specific in terms of the regressions that you're seeing? Are these new regressions that you've seen specifically introduced over time in Nova-3? We've looked into these specific requests and are digging further, but any additional context you can provide is very helpful. One thing to note is that these requests are all MP4s. For optimal transcription and diarization performance in general, we recommend first converting MP4 files to audio-specific formats.