#Help streaming to websocket from ios

1 messages · Page 1 of 1 (latest)

low widget
#

Hi, I got an example (thanks @harsh crystal ) of a direct websocket connection to deepgram working, now I am porting it to Apache's Cordova. I am using the audioinput plugin which returns WAV format data. I'm concatenating this data and sending it to the socket, but I get a blank response from the Deepgram API and it closes the socket (the apparently empty response is pasted at bottom). Any idea how I can stream this WAV data to the socket?

My code is attached in the example.js file.

The API returns only:

{"type":"Metadata","transaction_key":"deprecated","request_id":"REQUEST_ID_REMOVED","sha256":"SHA_REMOVED","created":"2024-04-08T22:33:11.792Z","duration":0,"channels":0}

...and the socket then closes.

Thanks so much for any help!

  • Quinn McLaughlin
half haloBOT
#

Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:

  • Provide the request_id if you've a question about a transcription response.
  • The options you used or the api.deepgram.com URL you sent your request to, including parameters.
  • Any code snippets you can include.
  • Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.
pastel pond
#

Hi @low widget

It looks like you are sending raw pcm audio not wav audio.

You will need to pass the encoding and sample_rate to tell Deepgram the format you are sending

See:

eg by adding "encoding=linear16&sample_rate=44100"

wss://api.deepgram.com/v1/listen?diarize=true&model=nova&language=en&punctuate=true&smart_format=true&interim_results=false&encoding=linear16&sample_rate=sampleRate

low widget
#

@pastel pond Thanks so much for this. I did get data sending to Deepgram now and I am getting results back - unfortunately they are always empty. And I can see audio is sending. I attached my current example.js file again, with the most recent updates - I removed my concatenation function to make sure that that wasn't the cause of the error. I see the data going over - here is the tail end of my console log. Any ideas for me? The request ID's are here too:

sent to socket
Audio data received: 8000 samples
sent to socket
Audio data received: 7168 samples
sent to socket
Audio data received: 7168 samples
sent to socket
Audio data received: 7808 samples
sent to socket
DATA FROM DEEPGRAM:
{"type":"Results","channel_index":[0,1],"duration":3.5300007,"start":24.9,"is_final":true,"speech_final":false,"channel":{"alternatives":[{"transcript":"","confidence":0,"words":[]}]},"metadata":{"request_id":"5d4dacd0-6195-4015-8ccc-4c659f82dc0e","model_info":{"name":"general-nova","version":"2023-07-28.18608","arch":"nova"},"model_uuid":"b227621c-0920-4128-b4b5-a3e0f525d2d7"},"from_finalize":false}
Time interval since last message: 27.207 seconds
Audio data received: 7168 samples
sent to socket
Audio data received: 7808 samples
sent to socket
Audio data received: 7168 samples
sent to socket
Audio data received: 7808 samples
sent to socket
Audio data received: 7168 samples
sent to socket
DATA FROM DEEPGRAM:
{"type":"Results","channel_index":[0,1],"duration":3.1100006,"start":28.43,"is_final":true,"speech_final":false,"channel":{"alternatives":[{"transcript":"","confidence":0,"words":[]}]},"metadata":{"request_id":"5d4dacd0-6195-4015-8ccc-4c659f82dc0e","model_info":{"name":"general-nova","version":"2023-07-28.18608","arch":"nova"},"model_uuid":"b227621c-0920-4128-b4b5-a3e0f525d2d7"},"from_finalize":false}
Time interval since last message: 29.435 seconds
Audio data received: 7808 samples
sent to socket

pastel pond
#

eg.

socket.send(event.data);
low widget
#

Just tried that - no luck

#

Is linear16 the right encoding to send when I open the websocket?

#

I think the default audio type when a browser is sending a MediaRecorder stream is webm. But the audioinput plugin for Cordova returns PCM16 bit.

#

Thanks for your help @pastel pond

low widget
#

I tried routing the data through an AudioContext - and got that stream working. But still no luck getting data back. empty sets as above.

pastel pond
#

If your sending us Webm Opus you should remove the encoding and sample rate params

#

Webm Opus has headers with the right info

#

Does the code in the example above not work for you on Cordova?

low widget
#

the node.js code above works great in a browser but won't work in cordova since the browser is running in the ios sandbox. That removes the browser's MediaRecorder function. The audioinput plugin works around it, but my code sending it's PCM data doesn't seem to work right with Deepgram. I'd love to get that working. I tried a second path pushing the audio through an audioContext (with the audioinput plugin set accordingly) but I'm pretty hazy on how that would work, and if it would even work.

Any ideas on how to get the PCM 16 bit to read? I wasn't sure if the linear16 required a special byte somewhere or ..??

#

Thanks again for your help @pastel pond !

pastel pond
#

can you capture the audio to a file and listen to it in something like Audacity?

Wondering if it's playable audio or corrupt

low widget
#

I was about to graph it on a visualizer to make sure I am getting data.

When I try to open the websocket with 'wss://api.deepgram.com/v1/listen?diarize=true&model=nova&language=en&punctuate=true&smart_format=true&interim_results=false' ...I get a blank response and the socket closes, as soon as data is sent. So I guess my audioContext processor is not sending opus/ webm data.

So I will go back to my first approach - sending PCM with the linear16 encoding - and verify the audio data is in there.

My second approach (with audioContext) had been this - in case you see something obvious... but will work on first approach for now.

` const captureCfg = {
sampleRate: 16000,
bufferSize: 16384,
channels: audioinput.CHANNELS.MONO,
format: audioinput.FORMAT.PCM_16BIT,
concatenateMaxChunks: 20,
audioSourceType: audioinput.AUDIOSOURCE_TYPE.DEFAULT,
streamToWebAudio: true
};

socket.onopen = () => {
console.log({ event: 'onopen' });
document.querySelector('#status').textContent = 'Connected';
audioinput.start(captureCfg);
aContext = audioinput.getAudioContext();
let bufferSize = 16384;
processor = aContext.createScriptProcessor(bufferSize, 1, 1);
audioinput.connect(processor);
processor.connect(audioinput.getAudioContext().destination)
console.log('audioinput started');
processor.onaudioprocess = function(audioProcessingEvent) {
let inputBuffer = audioProcessingEvent.inputBuffer;
let inputData = inputBuffer.getChannelData(0);
console.log('sent via context');
};
}`

pastel pond
#

Where are you sending the audio data to Deepgram?

Make sure that you are sending audio there maybe with some Debug logs

If you send an empty array to Deepgram over the websocket we will treat that a signal to close the websocket also

low widget
#

There's data in the arrays but the format doesn't seem to match. Giving up on Cordova for the moment trying Flutter... expect that to work better.

pastel pond