#I don’t think I understand what you are

1 messages · Page 1 of 1 (latest)

jovial blaze
#

Let's see if I can explain it better. Each conversation is a series of turn which basically includes (1) get transcript, (2) call LLM, (3)  do in parallel: (3-1) streaming of llm output to 11labs websocket (producer), and (3-b) listen to websocket synthetized audio frames (consumer). How do I know for certain that I have received all synthetized audio frames from one turn, to be able to close consumer for this turn? If i understand well, i can not send the EOS message from producer for this turn, as it will closed the websocket which will prevent to use it during the whole session.  I can not either check if the final synthetized speech as same chars than the one sent in producer, as i have observed that sometimes they are different (some punctuation, or space added in 11labs result).

tribal dust
#

Let me clarify: you are talking about your custom agent, which utilizes 11labs TTS streaming?

jovial blaze
#

I connect to 11labs websocket to do TTS in streaming. I push chunks from llm ouptut to websocket. What i would like to do is that if my llm return 3 chunks ["hi", how" "are you"] in one turn, i would like to know when i got from 11labs weboscket all the audio frames that corresponds to "hi how are you".

tribal dust
jovial blaze
#

So that is the point i do not understand. Because to get a isFinal in one of the audio generated i need to send a EOS text in the websocket. And this will also closed the websocket, no?

tribal dust
#

If by EOS text you mean something that ends with a dot (or equivalent) - then no, it will not close the websocket. I personally haven't used it but based on the documentaion and common sense this should not happen

jovial blaze
#

by EOS i mean a empty string "". Otherwise what are the case where you get a isFinal=True?

tribal dust
#

Ok, empty string will end the connection. Why is it a problem?

jovial blaze
#

if i have to send en empty string at each turn, i expect then at each turn the websocket will be closed. Then, i go back to the solution of opening and closing a websocket at each turn, which is not what you recommended (having a websocket opened during the whole conversation)