Hello! I am using the STT with python sdk with the following optionsÖ
options = LiveOptions(
model="nova-2",
language="en-US",
smart_format=True,
## To get UtteranceEnd, the following must be set:
interim_results=True,
utterance_end_ms="1000", # maybe adds delay?
vad_events=True,
endpointing=500, # was 300
)
It seems not to be really optimal for capturing user speech as it delivers if result.is_final and result.speech_final not so consistently and I need a better way to do it. I was thinking about implementing VAD detection from FE side but not sure.... Maybe there is someone here with experience or recommendations on the settings for the client or what kind of VAD to use for the STT...
#Deepgram VAD parameters
1 messages · Page 1 of 1 (latest)
Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:
- Provide the
request_idif you've a question about a transcription response. - The options you used or the api.deepgram.com URL you sent your request to, including parameters.
- Any code snippets you can include.
- Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.
The way I am using it right now: WS connection from FE to BE. Stream audio bytes directly from FE to the BE into the DG python SDK.
@hybrid rock This might be something @scarlet kelp has some suggestions on when he has some time to look at this question. Thanks for being patient.
hi @hybrid rock
What exactly is this app trying to do? Like what is it's primary function (basic transcription, call coaching, AI assistant, etc)?
depending on the length of your typical pause from your users, you might need to increase the endpointing time:
https://developers.deepgram.com/docs/endpointing
Not sure, about increasing endpointing but rather reducing?! It would be about chatting. I.e. as soon as there is a break or something it should trigger speech_final
just all depends on what is important to you and that will reflect on the settings. depending on what you set, you might find is_final triggering too early. again, all depends on what you are trying to do
my advice would be to try the settings and see what works best. I dont know what kind of app, audio/chat, etc is happening. you might find other things work best for you
Hey David, I do not get the is_final or speech_final triggered fast enough, even with the utterance_end and endpoiting feature. Is there a trigger for onSpeechEnd? i.e. see onSpeechEnd wait 300 MS and stop?
are you pausing or stopping the audio stream at all? what are the setting you are currently using?
the following option set: for LiveOptions (can't paste all since the chat blocks it)
smart_format=True, interim_results=True,
utterance_end_ms="1000",
endpointing=100