Deepgram VAD parameters | Deepgram | Page 1

hybrid rock Jul 10, 2024, 4:37 PM

#

Hello! I am using the STT with python sdk with the following optionsÖ
options = LiveOptions(
model="nova-2",
language="en-US",
smart_format=True,
## To get UtteranceEnd, the following must be set:
interim_results=True,
utterance_end_ms="1000", # maybe adds delay?
vad_events=True,
endpointing=500, # was 300
)
It seems not to be really optimal for capturing user speech as it delivers if result.is_final and result.speech_final not so consistently and I need a better way to do it. I was thinking about implementing VAD detection from FE side but not sure.... Maybe there is someone here with experience or recommendations on the settings for the client or what kind of VAD to use for the STT...

rare zephyrBOT Jul 10, 2024, 4:37 PM

#

Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:

Provide the request_id if you've a question about a transcription response.
The options you used or the api.deepgram.com URL you sent your request to, including parameters.
Any code snippets you can include.
Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.

hybrid rock Jul 10, 2024, 4:38 PM

#

The way I am using it right now: WS connection from FE to BE. Stream audio bytes directly from FE to the BE into the DG python SDK.

urban solar Jul 19, 2024, 11:08 PM

#

@hybrid rock This might be something @scarlet kelp has some suggestions on when he has some time to look at this question. Thanks for being patient.

scarlet kelp Jul 20, 2024, 12:24 AM

#

hi @hybrid rock

What exactly is this app trying to do? Like what is it's primary function (basic transcription, call coaching, AI assistant, etc)?

depending on the length of your typical pause from your users, you might need to increase the endpointing time:
https://developers.deepgram.com/docs/endpointing

Deepgram Docs

Live Streaming - Endpointing - Deepgram Docs

Endpointing returns transcripts when pauses in speech are detected.

hybrid rock Jul 24, 2024, 8:35 AM

#

Not sure, about increasing endpointing but rather reducing?! It would be about chatting. I.e. as soon as there is a break or something it should trigger speech_final

scarlet kelp Jul 24, 2024, 6:07 PM

#

just all depends on what is important to you and that will reflect on the settings. depending on what you set, you might find is_final triggering too early. again, all depends on what you are trying to do

#

my advice would be to try the settings and see what works best. I dont know what kind of app, audio/chat, etc is happening. you might find other things work best for you

hybrid rock Aug 7, 2024, 10:48 AM

#

Hey David, I do not get the is_final or speech_final triggered fast enough, even with the utterance_end and endpoiting feature. Is there a trigger for onSpeechEnd? i.e. see onSpeechEnd wait 300 MS and stop?

scarlet kelp Aug 7, 2024, 1:57 PM

#

are you pausing or stopping the audio stream at all? what are the setting you are currently using?

hybrid rock Aug 13, 2024, 3:39 PM

#

the following option set: for LiveOptions (can't paste all since the chat blocks it)

smart_format=True,            interim_results=True,
utterance_end_ms="1000",
endpointing=100

#Deepgram VAD parameters