#Deepgram VAD parameters

1 messages · Page 1 of 1 (latest)

hybrid rock
#

Hello! I am using the STT with python sdk with the following optionsÖ
options = LiveOptions(
model="nova-2",
language="en-US",
smart_format=True,
## To get UtteranceEnd, the following must be set:
interim_results=True,
utterance_end_ms="1000", # maybe adds delay?
vad_events=True,
endpointing=500, # was 300
)
It seems not to be really optimal for capturing user speech as it delivers if result.is_final and result.speech_final not so consistently and I need a better way to do it. I was thinking about implementing VAD detection from FE side but not sure.... Maybe there is someone here with experience or recommendations on the settings for the client or what kind of VAD to use for the STT...

rare zephyrBOT
#

Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:

  • Provide the request_id if you've a question about a transcription response.
  • The options you used or the api.deepgram.com URL you sent your request to, including parameters.
  • Any code snippets you can include.
  • Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.
hybrid rock
#

The way I am using it right now: WS connection from FE to BE. Stream audio bytes directly from FE to the BE into the DG python SDK.

urban solar
#

@hybrid rock This might be something @scarlet kelp has some suggestions on when he has some time to look at this question. Thanks for being patient.

scarlet kelp
hybrid rock
#

Not sure, about increasing endpointing but rather reducing?! It would be about chatting. I.e. as soon as there is a break or something it should trigger speech_final

scarlet kelp
#

just all depends on what is important to you and that will reflect on the settings. depending on what you set, you might find is_final triggering too early. again, all depends on what you are trying to do

#

my advice would be to try the settings and see what works best. I dont know what kind of app, audio/chat, etc is happening. you might find other things work best for you

hybrid rock
#

Hey David, I do not get the is_final or speech_final triggered fast enough, even with the utterance_end and endpoiting feature. Is there a trigger for onSpeechEnd? i.e. see onSpeechEnd wait 300 MS and stop?

scarlet kelp
#

are you pausing or stopping the audio stream at all? what are the setting you are currently using?

hybrid rock
#

the following option set: for LiveOptions (can't paste all since the chat blocks it)

smart_format=True,            interim_results=True,
utterance_end_ms="1000",
endpointing=100