Ok i believe we are closer than ever to having perfected the parameters for real time streaming. One issue we just ran into is that if there is any background noise at all, even very faint, it prevents speech_final from activating, even if its clear that it should and the person is does speaking. Is there a "sensitivity" parameter or a minimum decibal to include as part of this to solve this? It just takes in the speech and then holds it in inbound speech buffer if there's any background noise at all for a long period of time, creating extreme latency
#Need help optimizing the inbound message buffer slightly
1 messages · Page 1 of 1 (latest)
Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:
- Provide the
request_idif you've a question about a transcription response. - The options you used or the api.deepgram.com URL you sent your request to, including parameters.
- Any code snippets you can include.
- Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.
This part:
Is there a "sensitivity" parameter or a minimum decibal to include as part of this to solve this?
Can be done on the client side. When I record for my own demos, I have a noise filter, etc and also a minthreshold for the mic activating.
Linking to a similar question you asked:
https://discord.com/channels/1108042150941294664/1186434494421549147
if this threshold has not been reached, just send "empty" audio (ie a linear16 stream with headers and the payload containing zeros)
We also have a very insensitive solution which uses word timings of the transcription to determine when someone has finished speaking that is completely unaffected by background noise: https://github.com/nikolawhallon/temp-utterance-end