#SpeechStarted continuously fires

1 messages · Page 1 of 1 (latest)

shy saffron
#

I am using VAD and I am noticing that the SpeechStarted event is continuously firing even though there is radio silence on the mic.

For example here is my code:

const live = deepgram.listen.live({
  model: "nova-3",
  language: "en-US",
  smart_format: true,
  encoding: "linear16",
  sample_rate: 24000,
  interim_results: true,
  utterance_end_ms: 5000, // Wait for 5 seconds of silence
  vad_events: true,
  punctuation: true,
  // endpointing: 300,
});
live.on("SpeechStarted", (data: any) => {
  console.log("Speech started at:", data.timestamp);
});

My console looks like this:

Speech started at: 0.21
Speech started at: 1.66
Speech started at: 3.23
Speech started at: 4.52
Speech started at: 5
Speech started at: 5.35
Speech started at: 5.82
Speech started at: 6.12
Speech started at: 6.71
Speech started at: 7.98
muted nexusBOT
#

Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently.
-# If you haven't done so, ensure your Discord and Github profiles are linked to Deepgram so you can earn points to redeem on cool stuff just by being active!

#

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

  • The deepgram product you are using (e.g Speech to Text, Agent API)
  • A request ID that triggered your error or issue.
rotund flameBOT
#

What are you using the Speech Started event for? For context, for Speech Started we use a highly sensitive VAD and VAD turn on default, and VAD will pick up any noise - not just speech - so it can be really noisy. If you're trying to solve for agent barge-in, I would recommend instead using interim results and triggering your barge-in behavior at >= 2-3 words in an interim result (or a final result with > 0 words).

shy saffron
#

In my app, the user is supposed to be able to interupt an action in progress by speaking, which is why I need VAD and SpeechStarted

rotund flameBOT
#

Got it. Still I would recommend using words in final or interim results for this rather than VAD events, as VAD is sensitive and will pick up more than just words.

shy saffron
#

Great, I'll try that then. I'm curious though then... what is VAD actually useful for?

shy saffron
#

I still think VAD is ideal though, because by detecting words we're lagging behind a bit more and our app needs to be really responsive. Is there a way to tweak the sensitivity of VAD and SpeechStarted?

plucky lily
#

VAD is better implemented on the client, so server latency doesn't factor in to how effective it is. Our VAD on the API is way too sensitive to be useful at this point, that's true.

I had some success with looking for the FIRST word in an interim result that has a high confidence (low confidence could be a hallucination on background noise, just like VAD)