#Switching from Streaming to Pre-Recorded

1 messages · Page 1 of 1 (latest)

cerulean stag
#

Hi, I am trying to switch my application from the streaming API (https://developers.deepgram.com/reference/streaming) to the pre-recorded API (https://developers.deepgram.com/reference/pre-recorded).

I would like to know how to construct a request to the pre-recorded API that's as similar as possible to my current use of the streaming API.

Currently, my application uses the Deepgram streaming API, and I've been quite happy with the transcription quality.
I initiate a websocket connection with the URL wss://api.deepgram.com/v1/listen?channels=1&encoding=linear16&sample_rate=16000&tier=nova , and send frames of ~4096 bytes containing ~150ms of audio.

I have attempted to convert this to a regular HTTP request per the docs:

%{
  scheme: :https,
  host: "api.deepgram.com",
  method: "POST",
  path: "/v1/listen",
  headers: [
    {"Authorization", "Token <API_KEY>"},
    {"Content-Type", "audio/wav"}
  ],
  body: <<raw binary of ~4096 bytes (~150ms of audio)>>,
  query: "channels=1&encoding=linear16&sample_rate=16000&tier=nova"
}

However, this always yields an error response despite no changes to the format of the audio being delivered:

{
  "err_code": "Bad Request",
  "err_msg": "Bad Request: failed to process audio: corrupt or unsupported data",
  "request_id": "<request_id>"
}

I'm wondering how I can switch from the streaming API to the pre-recorded API?
Any help is greatly appreciated!

stoic lantern
cerulean stag
#

Looks like using that URL worked without issue.

Request:

%{
  scheme: :https,
  host: "api.deepgram.com",
  port: 443,
  method: "POST",
  path: "/v1/listen",
  headers: [
    {"Authorization", "Token <API_KEY>"},
    {"Content-Type", "application/json"}
  ],
  body: "{\"url\":\"https://static.deepgram.com/examples/interview_speech-analytics.wav\"}",
  query: "channels=1&tier=nova",
}

Response:

%{
  "metadata" => %{
    # ...
  },
  "results" => %{
    # ...
  }
}

Really not sure why sending my raw audio would be considered corrupt or unsupported, though.

I know the format should be supported, because it's the same audio I send over websocket.
I know my data is not corrupted, as the exact same audio that I want transcribed is being saved as a file audio.wav that plays without issue, and running file audio.wav gets the result audio.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz