#is_final transcription not coming even after long pause of speech

1 messages · Page 1 of 1 (latest)

oblique wadi
#

My config -

      smart_format: true, 
      model: 'nova-phonecall',
      encoding: 'mulaw',
      sample_rate: 8000,
      interim_results: true,
      language: 'en',
    });

I have a simple app logging the live transcriptions received from deepgram.

Problem - The is_final: true event is coming VERY late o does not come until i say something else. You can see the is_final false transcripts have a full stop as well as i dint say anything for a long time (shown in log with red).

Request ID - a76e74f2-ba67-4c5c-8ade-6ba73ceabcfb

pine wraithBOT
#

Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:

  • Provide the request_id if you've a question about a transcription response.
  • The options you used or the api.deepgram.com URL you sent your request to, including parameters.
  • Any code snippets you can include.
  • Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.
oblique wadi
#

See the timestamps to estimate time between each log, also the red colour text shows time elasped after last log.

#

I have been continuously using dg and i have same usecase/speech earlier as well it worked well earlier. It has gotten flaky only in the last 3-4 days.

oblique wadi
#

To be clear this is happening ~60% of the time, not always, i have also tried with nove-2-phonecall its the same.

oblique wadi
#

@swift sluice @karmic terrace any help is appreciated

fluid spoke
#

I'm sorry you didn't get an answer to your question. There is a feature that we're in the process of documenting that may be helpful, which you can read about in this guide. Could you try the UtteranceEnd feature and see if it helps? https://developers.deepgram.com/docs/understanding-end-of-speech-detection-while-streaming

Deepgram Docs

Detecting when a speaker is finished speaking is a nuanced task. Humans speak at different speeds and may pause in the middle of a sentence for different reasons.For example, when saying a phone number over the phone, we will often pause after speaking the area code and the first three digits so the...

swift sluice
#

thanks for answering this @fluid spoke I'm juggling a lot of things and my replies to discord have slowed down some

oblique wadi
#
  1. If you changed production behaviour of an API product in a way that breaks your customer's existing code and requires a change of code, you should inform them in advance.

  2. After reading the blog, I altered my code to send utterance_end_ms: 1000, as well, but it does not help. I am still not getting transcript events. It is EXTREMELY UNSTABLE, sometime gives absolutely no events, sometimes gives blank transcript interim events, and once it gave me all the many events (as expected/desired which was happening earlier).

  3. Please seriously do some testing on your own product.

  4. These are the request ids (i tried just now) for your debugging 3fa2d51e-87cc-4ce4-9157-98c0591fd478 , 6c6c3153-b812-4d7e-af81-48e33d0d748a, dd56121c-0593-4c9b-945f-39b25946fef8, 331c9d8b-7d05-435d-a6d0-1b12b3eebd6f, 235d80ab-7c82-4bce-a2b9-f14a952aa984 , 628402dd-c695-4f46-bd1e-1a68bb3cde38, 89b796dd-ee35-407b-a5ec-8c84a13c21e6 , 41b81f4f-6382-4f73-9638-b99b6ec4ed1a,

  5. I am using nodejs sdk "@deepgram/sdk": "^2.4.0",

swift sluice
#

so I took a look at the screenshot you have... have you tried using real sentences? is_final is trying to capture complete sentences/thoughts and just saying the number 3 repeatedly probably isnt going to do a lot for you in that category.

if just saying 3 or other numbers over and over again is part of a normal conversation you having that you are looking to get transcribed, enabling interim_results like you are doing should be enough and you probably dont need to care about the is_final flag at all

#

if saying the number 3 or other numbers over and over isn't typical of a conversation, I would try testing with samples that are representative of conversations and we can take a look here

oblique wadi
#

Use case in phonecalls and in the above example ai asked to rate experience out of 5 (so i was saying 3).

Dont worry about the 3. i can handle that on my end with interim as you said.

#

My issue is that its not catching "hello?" and "hey" and "are you listening?"

#

Also, just FYI sometimes it does not even give the empty interim result that it usually did earlier

#

Also, this is happening only within last 10 days, was not happening earlier (been using the same for 2 months now)

swift sluice
#

as for updates, I'm fairly new-ish so I dont know how often models are updated I'm guessing probably not to often if at all without providing versioned instances if the default one is updated.

i think the empty intern hasn't changed, but I can take a look.for the "hey" type words, you might need to enable filler_words set to True to get some of those filler words. you might want to give that a try.

as for the delay between saying things... if you are playing around with utterance_end_ms, you should realizes that there is going to be delays when that number gets increased. so if you say a value of 1000 is probably going to impact how often you get your results

oblique wadi
#

yep, i understand and i have am waiting for 5-10 seconds still not getting the events

swift sluice
#

will try somethings out and get back to you on what I notice

oblique wadi
#

Just noticed on your status page, there was an incident on Nov 28, 2023 Monitoring Streaming Latency i think that date kind of conincides with the date after which i started facing this issue. It is highly likely that issue is not properly resolved on your end.

swift sluice
#

is this still representative of your config?

deepgram.transcription.live({
      smart_format: true, 
      model: 'nova-phonecall',
      encoding: 'mulaw',
      sample_rate: 8000,
      interim_results: true,
      language: 'en',
    });
#

using utterance_end_ms or filler_words or etc? that isn't there?

#

just trying to do an apples to apples comparison

#

dont want to start trying stuff and then all of the sudden "hey I am also using this settings, sorry"

oblique wadi
#

i switched to using model: 'nova-2-phonecall', but i switched after i saw issues with model: 'nova-phonecall',

swift sluice
#

can you copy and paste your config here?

oblique wadi
#

i added utterance_end_ms: 1000, today after seeing the msg

#

    this.stream = deepgram.transcription.live({
      // observation - making smart_format false gives blank (is_final true) results every 3-4 seconds
      smart_format: true, // smart_format has punctuate: true internally

      // Models & Languages combination docs - https://developers.deepgram.com/docs/models-languages-overview
      model: 'nova-2-phonecall',

      // for twilio
      encoding: 'mulaw',
      sample_rate: 8000,

      // observation - interim_results: true keeps sending 1 event every second (which is final false)
      interim_results: true,
      language: 'en', // https://deepgram.com/product/languages

      // https://developers.deepgram.com/docs/understanding-end-of-speech-detection-while-streaming
      utterance_end_ms: 1000,
    });
swift sluice
#

ok cool. will take a look

oblique wadi
#

i tested now it is working

#

did you fix?

swift sluice
#

Nope. I never got the chance to look at it. Was too busy with other things to finish before the weekend.