Speech to text realtime streaming websocket issue in new version 3.1.1 | Deepgram | Page 1

vague streamBOT Jan 31, 2024, 4:49 AM

#

Thanks for asking your question. Please be sure to reply with as much detail as possible so we can assist you efficiently. Such as:

Provide the request_id if you've a question about a transcription response.
The options you used or the api.deepgram.com URL you sent your request to, including parameters.
Any code snippets you can include.
Any audio you can include, or if you can't share it here please email it to us at [email protected] and provide a link to this thread.

tulip pagoda Jan 31, 2024, 5:02 AM

#

here's the full logs:

📎 message.txt

tulip pagoda Jan 31, 2024, 5:44 AM

#

request_id is 6c8198cf-de94-405f-9833-b0f4bc7dfb8c

tulip pagoda Jan 31, 2024, 7:45 AM

#

UPDATE

I found the issue regarding the error above; the utterance_end_ms parameter value cannot be <1000.
The documentation mentions that it's not recommended:

documentation

In practice, you should set the value of utterance_end_ms to be 1000 ms or higher. Deepgram's Interim Results are typically sent every 1 second, so using a value of less than 1 second will not offer any benefits.

however, having a utternance_end_ms value <1000 yields a 400 bad request and I think Deepgram's error handling could be more informative.

I had to do some digging to to get the raw response:

HTTP/1.1 400 Bad Request'
status_code 400 reason Bad Request headers content-type: application/json
dg-error: Invalid 'utterance_end_ms' value of '999'.
content-length: 145
dg-request-id: 587b3227-f6ca-4818-bb4d-028182adf4ea
...

After adjusting the utterance_end_ms to 1000 and changing self.dg_connection = self.deepgram_client.listen.live.v("1") to self.dg_connection = self.deepgram_client.listen.asynclive.v("1") then i'm getting this error:

deepgram/clients/live/v1/async_client.py:219: RuntimeWarning: coroutine 'AsyncLiveClient._emit' was never awaited
  self._emit(LiveTranscriptionEvents.Error, error)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Task exception was never retrieved
future: <Task finished name='Task-16' coro=<AsyncLiveClient._start() done, defined at deepgram/clients/live/v1/async_client.py:97>
Traceback (most recent call last):
  File "deepgram/clients/live/v1/async_client.py", line 114, in _start
    await self._emit(
  File "deepgram/clients/live/v1/async_client.py", line 95, in _emit
    await handler(self, *args, **kwargs)
TypeError: object NoneType can't be used in 'await' expression

Deepgram Docs

End of Speech Detection While Live Streaming

Learn how to use End of Speech when transcribing live streaming audio with Deepgram.

median bane Jan 31, 2024, 10:28 AM

#

Yup, we will error if you attempt to do <1000 or >5000 (I believe). However our SDKs also support on-prem, which can arguably make those values whatever their hardware is happy to support (low or high).

Docs and a better error back from the API would be ideal, and I will feed this back to the team.

tulip pagoda Jan 31, 2024, 2:38 PM

#

Thanks @median bane for the quick response.

#

Would you be able to help me with the latter issue?

After adjusting the utterance_end_ms to 1000 and changing self.dg_connection = self.deepgram_client.listen.live.v("1") to self.dg_connection = self.deepgram_client.listen.asynclive.v("1") then i'm getting this error:

deepgram/clients/live/v1/async_client.py:219: RuntimeWarning: coroutine 'AsyncLiveClient._emit' was never awaited
self._emit(LiveTranscriptionEvents.Error, error)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Task exception was never retrieved
future: <Task finished name='Task-16' coro=<AsyncLiveClient._start() done, defined at deepgram/clients/live/v1/async_client.py:97>
Traceback (most recent call last):
File "deepgram/clients/live/v1/async_client.py", line 114, in _start
await self._emit(
File "deepgram/clients/live/v1/async_client.py", line 95, in _emit
await handler(self, *args, **kwargs)
TypeError: object NoneType can't be used in 'await' expression

signal kettle Jan 31, 2024, 2:53 PM

#

I haven't touched Python in a long time, but I think it should be...
self.dg_connection = **await **self.deepgram_client.listen.asynclive.v("1")

#

@tulip pagoda

median bane Jan 31, 2024, 3:00 PM

#

I think that might be right ☝️ I will ask @lime condor to have a quick look when he logs in today

#

(I'm not a Python nerd)

tulip pagoda Jan 31, 2024, 3:27 PM

#

Awesome thanks let me try that now i'll update you 🙂

#

hmmm nope that did not work:

    self.dg_connection = await self.deepgram_client.listen.asynclive.v("1")
TypeError: object AsyncLiveClient can't be used in 'await' expression

lime condor Jan 31, 2024, 3:47 PM

#

I think this was also posted to a github issue and was answered (maybe?)

tulip pagoda Jan 31, 2024, 3:47 PM

#

Yes i'm the same person!

#

however the latter was not addresses

lime condor Jan 31, 2024, 3:48 PM

#

there was a bug in 3.0.0 where you needed an async for the on_message

tulip pagoda Jan 31, 2024, 3:48 PM

#

Would you mind helping me out with the latter issue:

After adjusting the utterance_end_ms to 1000 and changing self.dg_connection = self.deepgram_client.live.live.v("1") to self.dg_connection = self.deepgram_client.listen.asynclive.v("1") then i'm getting this error:

deepgram/clients/live/v1/async_client.py:219: RuntimeWarning: coroutine 'AsyncLiveClient._emit' was never awaited
self._emit(LiveTranscriptionEvents.Error, error)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Task exception was never retrieved
future: <Task finished name='Task-16' coro=<AsyncLiveClient._start() done, defined at deepgram/clients/live/v1/async_client.py:97> exception=TypeError("Logger._log() got an unexpected keyword argument 'error'")>
Traceback (most recent call last):
File "deepgram/clients/live/v1/async_client.py", line 114, in _start
await self._emit(
File "deepgram/clients/live/v1/async_client.py", line 95, in _emit
await handler(self, *args, **kwargs)
TypeError: object NoneType can't be used in 'await' expression

lime condor Jan 31, 2024, 3:48 PM

#

for all the on_ hooks really

tulip pagoda Jan 31, 2024, 3:48 PM

#

I am using 3.1.1

signal kettle Jan 31, 2024, 3:48 PM

#

I can also confirm that their is a sample app that covers this.
https://github.com/deepgram/deepgram-python-sdk/blob/e2ddfb4f97def5e5c71910bf2a84b50c8a4ba349/examples/streaming/async_http/main.py#L30

GitHub

deepgram-python-sdk/examples/streaming/async_http/main.py at e2ddfb...

Official Python SDK for Deepgram's automated speech recognition APIs. - deepgram/deepgram-python-sdk

lime condor Jan 31, 2024, 3:49 PM

#

use https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/streaming/async_http/main.py as a guide

GitHub

deepgram-python-sdk/examples/streaming/async_http/main.py at main ·...

Official Python SDK for Deepgram's automated speech recognition APIs. - deepgram/deepgram-python-sdk

tulip pagoda Jan 31, 2024, 3:50 PM

#

class TranscriptionService:
    def __init__(self, on_transcript_callback, on_interrupt_callback, audio_playing_callback, is_beginning_of_call):
        self.deepgram_client = DeepgramClient(api_key=DEEPGRAM_API_KEY, config=DeepgramClientOptions(verbose=20))

    async def start_transcription(self):
        self.dg_connection = self.deepgram_client.listen.asynclive.v("1")

        def on_message(_, result, **kwargs):
            print(f"RESULT: {result}")

        def on_utterance_end(_, utterance_end):
            print(f"[UTTERNANCE_END]: \n\n{utterance_end}\n\n")

        def on_metadata(_, metadata):
            if metadata is None:
                return
            print(f"\n\nMetadata: {metadata}\n\n")

        def on_error(_, error):
            if error is None:
                return
            print(f"\n\nError: {error}\n\n")

        self.dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
        self.dg_connection.on(LiveTranscriptionEvents.Metadata, on_metadata)
        self.dg_connection.on(LiveTranscriptionEvents.Error, on_error)
        self.dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)

        options = LiveOptions(
            model="nova-2",
            interim_results=False,
            language="en-US",
            encoding="mulaw",
            sample_rate="8000",
            punctuate=True,
            # endpointing=400,
            # utterance_end_ms=1000
        )

        await self.dg_connection.start(options)

    async def send_audio_data(self, data):
        if not self.dg_connection:
            print("Deepgram live transcription object is 'None'; cannot send data.")
            return  # Early exit if the transcription isn't started
        await self.dg_connection.send(data)

    async def stop_transcription(self):
        if self.dg_connection:
            await self.dg_connection.finish()
            self.dg_connection = None

#

Let me try adding async to the event functions

#

Ok nice that worked!

signal kettle Jan 31, 2024, 3:52 PM

#

Glad to hear it.

median bane Jan 31, 2024, 3:52 PM

#

superb Clap

lime condor Jan 31, 2024, 3:53 PM

#

got it working?

tulip pagoda Jan 31, 2024, 3:53 PM

#

Quick other question:

How do I send a keepalive?
https://developers.deepgram.com/reference/listen-live#stream-keepalive

Deepgram Docs

Transcribe - Live audio

Transcribe Live Streaming AudioDeepgram provides its customers with real-time, streaming transcription via its streaming endpoints. These endpoints are high-performance, full-duplex services running over the tried-and-true WebSocket protocol, which makes integration with customer pipelines simple du...

#

yes it worked 🙂

lime condor Jan 31, 2024, 3:53 PM

#

on async, it isnt implemented which means, you need to currently send them yourself

tulip pagoda Jan 31, 2024, 3:54 PM

#

here's the working implementation:

class TranscriptionService:
    def __init__(self, on_transcript_callback, on_interrupt_callback, audio_playing_callback, is_beginning_of_call):
        self.deepgram_client = DeepgramClient(api_key=DEEPGRAM_API_KEY, config=DeepgramClientOptions(verbose=20))

    async def start_transcription(self):
        self.dg_connection = self.deepgram_client.listen.asynclive.v("1")

        async def on_message(_, result, **kwargs):
            print(f"RESULT: {result}")

        async def on_utterance_end(_, utterance_end):
            print(f"[UTTERNANCE_END]: \n\n{utterance_end}\n\n")

        async def on_metadata(_, metadata):
            if metadata is None:
                return
            print(f"\n\nMetadata: {metadata}\n\n")

        async def on_error(_, error):
            if error is None:
                return

        self.dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
        self.dg_connection.on(LiveTranscriptionEvents.Metadata, on_metadata)
        self.dg_connection.on(LiveTranscriptionEvents.Error, on_error)
        self.dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)

        options = LiveOptions(
            model="nova-2",
            interim_results=False,
            language="en-US",
            encoding="mulaw",
            sample_rate="8000",
            punctuate=True,
            # endpointing=400,
            # utterance_end_ms=1000
        )

        await self.dg_connection.start(options)

    async def send_audio_data(self, data):
        if not self.dg_connection:
            print("Deepgram live transcription object is 'None'; cannot send data.")
            return  # Early exit if the transcription isn't started
        await self.dg_connection.send(data)

    async def stop_transcription(self):
        if self.dg_connection:
            await self.dg_connection.finish()
            self.dg_connection = None

tulip pagoda Jan 31, 2024, 3:54 PM

#

lime condor on async, it isnt implemented which means, you need to currently send them yours...

How do I send that myself?

lime condor Jan 31, 2024, 3:54 PM

#

yea, yourself as in you need to implement this

#

https://developers.deepgram.com/reference/listen-live#stream-keepalive

Deepgram Docs

Transcribe - Live audio

Transcribe Live Streaming AudioDeepgram provides its customers with real-time, streaming transcription via its streaming endpoints. These endpoints are high-performance, full-duplex services running over the tried-and-true WebSocket protocol, which makes integration with customer pipelines simple du...

tulip pagoda Jan 31, 2024, 3:55 PM

#

Cool but where would I need to pass { "type": "KeepAlive" }?

lime condor Jan 31, 2024, 3:56 PM

#

using send()

tulip pagoda Jan 31, 2024, 4:01 PM

#

ahh I see

#

thanks

median bane Jan 31, 2024, 4:01 PM

#

It should be sent as a string!

tulip pagoda Jan 31, 2024, 4:06 PM

#

It doesn't seem that it takes options params:

    async def send(self, data):
        """
        Sends data over the WebSocket connection.
        """
        self.logger.spam("AsyncLiveClient.send ENTER")
        self.logger.spam("data: %s", data)

        if self._socket:
            await self._socket.send(data)
            self.logger.spam("data sent")

        self.logger.spam("AsyncLiveClient.send LEAVE")

lime condor Jan 31, 2024, 4:07 PM

#

data can be sent as a string

tulip pagoda Jan 31, 2024, 4:07 PM

#

this is in deepgram/client/live/v1/async_client.py

lime condor Jan 31, 2024, 4:07 PM

#

that how the underlying protocol knows to write as a message vs binary

tulip pagoda Jan 31, 2024, 4:08 PM

#

ok but what should the format be since i'm sending the audio media payload as data:

await transcription_service.send_audio_data(chunk)

    async def send_audio_data(self, data):
        if not self.dg_connection:
            print("Deepgram live transcription object is 'None'; cannot send data.")
            return  # Early exit if the transcription isn't started
        await self.dg_connection.send(data)

lime condor Jan 31, 2024, 4:09 PM

#

a string

#

just send the keepalive as a string message

#

no encoding, no nothing

tulip pagoda Jan 31, 2024, 4:12 PM

#

do I need to send it in addition to the audio chunk? Sorry for all these questions; i'm confused as to how to send it while i'm sending audio chunks in realtime

lime condor Jan 31, 2024, 4:12 PM

#

you obviously cant send the message at the same time

#

send(audio data)
send(audio data)
send(audio data)
send(keepalive)
send(audio data)
send(audio data)
send(audio data)
send(keepalive)
send(audio data)
send(audio data)

tulip pagoda Jan 31, 2024, 4:14 PM

#

i see!

so exactly like this:
send('{ "type": "KeepAlive" }')

lime condor Jan 31, 2024, 4:14 PM

#

in between sending the audio data, you need to send the keepalive

tulip pagoda Jan 31, 2024, 4:14 PM

#

?

lime condor Jan 31, 2024, 4:14 PM

#

yes

#

you are over thinking this a lot

tulip pagoda Jan 31, 2024, 4:14 PM

#

where { "type": "KeepAlive" } is a string

lime condor Jan 31, 2024, 4:15 PM

#

ok, I have other things I need to do. I think you will figure this out eventually

tulip pagoda Jan 31, 2024, 4:16 PM

#

I am although the documentation isn't very clear but thanks a lot.

#

Sure; I am trying to get this right before we launch to production as we are moving to a Growth plan with Deepgram

lime condor Jan 31, 2024, 4:19 PM

#

it's all good

#

if you have any suggestions on how to improve the docs, let me know

tulip pagoda Jan 31, 2024, 4:30 PM

#

Yes I have a few suggestions will write them up for you

lime condor Jan 31, 2024, 4:38 PM

#

you can also use the sync/threaded version as a guide since it's implemented there. we do have an issue to implement in async, but just havent gotten around to it yet

tulip pagoda Jan 31, 2024, 5:07 PM

#

got you thanks for the suggestion

wide shuttle Jun 11, 2024, 4:33 PM

#

Can someone help

lime condor Jun 12, 2024, 5:14 PM

#

hi @wide shuttle this is a super old thread. can you start a new request in #1115960287183511643 and include all details for what you are trying to do. minimally the SDK version, the deepgram options being used, and information about the audio stream (where is it coming from and what is the encoding of that audio)

#Speech to text realtime streaming websocket issue in new version 3.1.1