#How to identify "final" audio chunk of a sequence when streaming in realtime?

1 messages · Page 1 of 1 (latest)

dusty tide
#

I'm currently using a websocket connection to stream text and receive audio chunks.
Is there any way to know when the final audio chunk for a text part is received? This would help to segment and map audio chunks according to text sections during conversations.

When flushing after sending a text segment the next received audio chunk could still be from another previous text part, right?
I wonder how to tackle this and if there is some way.
Would be nice if we could flush with an ID and receive an audio chunk with that ID knowing we processed everything up to this flush.

broken whale
#

+1
flush completion events still needed, and/or word level timestamps