#what can i do if deepgram dont pick up spoken words because they are to quite

1 messages · Page 1 of 1 (latest)

rough basinBOT
#

Hey there! This channel allows you to ask Kapa.ai for assistance. But it looks like you haven't connected your Discord profile to your Deepgram account yet. You can do so at https://community.deepgram.com - by doing so, you'll be able to ask questions in #1115960287183511643 if you need human assistance!

short fossilBOT
#

Hi @urban pulsar. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

If quiet speech isn’t being captured well, here are practical steps you can try:

  • Use a model better suited to challenging/low-level audio, e.g., nova-2-phonecall. It’s trained on lower-quality domains and can be more robust. [Poor caption]
  • Add keyword boosting for short or easily-missed words (e.g., “yes,” “yup,” names, jargon) to increase their likelihood of being recognized. Tune boosts carefully. [Missed word; Keywords use cases]
  • Preprocess audio to improve SNR: apply noise suppression or send cleaner mic input (headphones vs. open speakers). Third‑party noise suppression before sending to Deepgram can reduce false positives and help quiet speech stand out. [Noise suppression; Headphones vs phone]
  • If this is live streaming and utterances are getting cut, reduce endpointing duration or use utterance_end for >1000 ms cases; long endpointing plus noise can cause early/late cuts. [Endpointing limits; Live cutoff]

If problems persist, share request IDs and examples with Deepgram Support so they can review your audio and responses. [Report accuracy]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@urban pulsar Kapa.ai is still learning and improving, please let me know how I did by reacting below.

#

Let me check the knowledge base for you again. One moment! :books:

#

Short answer: No.

  • nova-2-phonecall is listed only for English (en, en-US). German is supported on nova-2 general, but not the phonecall variant. [Nova-2]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@urban pulsar Kapa.ai is still learning and improving, please let me know how I did by reacting below.

#

Let me check the knowledge base for you again. One moment! :books:

#

Your guidance is mostly good, but I’d tweak it to align with Deepgram’s docs and set clearer expectations:

  • Keep default fast: Endpointing defaults to 10 ms and is great for low latency. [Endpointing]
  • Explain trade-offs: Longer endpointing improves “end of speech” reliability but can fail in noisy environments because VAD may never detect long pure silence; >1000 ms is especially fragile on phone/Twilio. [Limitations; Max value insights]
  • Offer a robust fallback: Recommend UtteranceEnd for end-of-utterance detection when endpointing is unreliable. Requires interim_results=true; set utterance_end_ms ≥1000 ms (e.g., 1000–2000). [UtteranceEnd; Limitations]

Suggested copy:

#
  • “Endpointing detects silence to decide when to send a final chunk. Default is 10 ms for lowest latency.” [Endpointing]
  • “If you see premature/fragmented results, try endpointing=300 to wait a bit longer before finalizing.” [Endpointing & interim]
  • “In noisy audio or for longer pauses, endpointing may not fire reliably. Enable interim_results=true and use utterance_end_ms=1000–2000 to get a reliable UtteranceEnd signal based on word timing, not raw audio.” [Limitations; UtteranceEnd]
  • “If issues persist, share examples so we can help tune settings.”

I’d avoid promising “quality and reliability will be better” at 300 in all cases; it depends on noise. Also add the UtteranceEnd path instead of “we’ll find another solution.”

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: