#Streaming Transcription: No Results when endpointing=false

1 messages · Page 1 of 1 (latest)

sleek sigil
#

I'm trying to setup Deepgram transcription in my project.

Request ID: f83352fa-1bfe-4ac9-baeb-775817efcf97

I already have my own VAD setup, so I'm trying to use that. My flow is roughly:

  1. Open the websocket
  2. Send some audio buffers
  3. Wait for my VAD silence event
  4. Send the {type: "Finalize"} message

I only get one message back from Deepgram, that indicates is_final, but the transcript is empty

     [debug] Received WebSocket message from Deepgram: "{\"type\":\"Results\",\"channel_index\":[0,1],\"duration\":0.7,\"start\":0.0,\"is_final\":true,\"channel\":{\"alternatives\":[{\"transcript\":\"At\",\"confidence\":0.38793945,\"words\":[{\"word\":\"at\",\"start\":0.32,\"end\":0.7,\"confidence\":0.38793945,\"punctuated_word\":\"At\"}]}]},\"metadata\":{\"request_id\":\"f83352fa-1bfe-4ac9-baeb-775817efcf97\",\"model_info\":{\"name\":\"general-nova-3\",\"version\":\"2025-01-09.0\",\"arch\":\"nova-3\"},\"model_uuid\":\"bf05427e-a1f1-4ced-a976-38b2f3533d8d\"},\"from_finalize\":true}", metadata: line=104 pid=<0.710.0> file=lib/smartvox/speech_to_text/stt_deepgram_streaming_client.ex domain=elixir application=smartvox mfa=Smartvox.SpeechToText.DeepgramStreamingClient.handle_frame/2 

My body params when opening the socket. Am I configuring this incorrectly?

params = %{
      encoding: "linear16",
      sample_rate: Keyword.get(opts, :sample_rate, 16000),
      channels: 1,
      model: "nova-3",
      language: Keyword.get(opts, :language, "en"),
      interim_results: false,
      smart_format: true,
      endpointing: false,
      utterances: false  # Enable utterance detection
    }
young widgetBOT
#

Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently.
-# If you haven't done so, ensure your Discord and Github profiles are linked to Deepgram so you can earn points to redeem on cool stuff just by being active!

#

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

  • The programming language you are working in (e.g. JavaScript, Python).
sleek sigil
#

I'm using Elixir

broken ruin
#

Hi @sleek sigil, the first thing I would check is that you're getting the audio settings correct. You could also try omitting them, if the audio you're sending is containerized and you're including headers

sleek sigil
#

I'm sending raw PCM. Pretty sure the audio settings are right. Though if I omit them will deepgram automatically detect the correct audio?

I did more testing this morning. I'm a bit puzzled. If I omit endpointing (which is what I was doing previously), and I send Finalize, then I get no transcript back. If I set endpointing to a ridiculously high number (1500), and do not send the Finalize, then I start to get transcripts (albeit not covering the whole audio file).

Since I have my own VAD system, what's the right workflow & setup for sending audio to deepgram?

broken ruin
#

We use the headers to detect, so if you chop them off or they have passed before you connect, then we wouldn't be able to detect.

Using Deepgram in combination with your own VAD is (in my opinion) the most robust solution to avoid things like looping

Perhaps it would be better to leave the connection open, and use VAD to determine when to start/stop sending KeepAlive requests to keep the connection open.

sleek sigil
#

Sure, that's what I'm trying to do. Though I'm confused about when & how to use the Finalize message. I was sending it when I detected silence. But that seemed to throw the system off. Is not sending it acceptable?