#Custom LLM - streaming

1 messages · Page 1 of 1 (latest)

silk ravine
#

Hi are you sure you are streaming back the response? can you please provide a conversation ID?

boreal moth
#

Hi! I don't have the conversation id now. However, I can send that on Monday.

Meanwhile, the script I used for the test is the same one shared by Eleven Labs. I haven't modified anything other than adding a print statement to ensure the chunks were yielding correctly. So it should be easy for you to reproduce and verify the issue.

silk ravine
#

So we start to produce TTS only after we receive a full sentence could it be your response was a single sentence?

boreal moth
# silk ravine So we start to produce TTS only after we receive a full sentence could it be you...

Hmm, it’s helpful information. However, we need to know what your backend considers a full sentence.

The sample code provided will yield chunks, it doesn't buffer them to define sentences before returning. You can find the sample code here: https://elevenlabs.io/docs/conversational-ai/customization/byollm

I can adjust it to buffer the chunks and form the sentences before returning data, but I believe it should be handled on your backend.

Also, if I have to handle that, is there anything more specific? For example, what do you consider a full sentence? We need to handle cases with ‘.’, ‘?’, ‘!’, and maybe something else.

I’d appreciate it if those could be better defined in the documentation, or that you could optimally handle the buffer on your backend.

Thanks for your support so far! I’m looking forward to having this working properly!

silk ravine
#

Yes we are handling on our backend.

#

I am just wondering if your custom llm implementation returns long first sentences?

dull yew
#

Hello. I am experiencing the same issue with the conversational AI waiting until the LLM completion SSE stream is finished before it will produce text to speech. As part of my testing, our LLM will respond with "One moment." immediately to the stream. I have tried adding the full stop to that string so that it could be picked up as a full sentence, but this has not changed that ElevenLabs waits until the end of the stream before speaking, at which point it will speak the entire completion response all at the same time. Like Bruno, to test this, I have instrumented my code with logging, and I have also run the completion requests through a HTTP debugger (Insomnia). I can see the SSE stream is working as expected, with our "One moment." message completion chunk being sent a second or two before the rest of the completion chunks. Any advice or support on this appreciated, thank you.

silk ravine
#

Let us check!

#

Thanks for reporting

dull yew
# silk ravine Thanks for reporting

Good day. Please note I have re-tested this issue today and the problem still seems to be occurring. Any advice or support on this appreciated, thank you.

silk ravine
#

@boreal moth @dull yew fixed!

boreal moth