#Eleven-Labs-Benchmarking/elevenlabs_webs...

1 messages · Page 1 of 1 (latest)

wintry grotto
#

We can do it in a thread here so as not to spam the main form.

prisma echo
#

What exactly does this test test?

wintry grotto
#

This is the test that you saw me running. It measures the amount of time for each part of the process. It shows you the chunks in real time being printed on your console, measures the latency between chunks, and the total amount of time the WebSocket is being run.

#

It takes away everything except for the raw response back from the server so there's really no other factors that could potentially be contaminant in the test results other than your network path to the server.

prisma echo
#

Does it benchmark streaming input, output or both?

wintry grotto
#

You can actually have it do anything and everything you want. I know it's a little confusing because I didn't really share this with anyone, but you can use the text chunker, which simulates an LLM input stream, or you can use where it says text chunk and you could send it all at once by using basically a delay time, which is the default of 0.0001.

#

If you have any trouble, I can get on a voice call and explain it with you, but that's basically what you need to know.

#

But I think this is a worthwhile test for you to run because that way you can see you can eliminate your specific network or location and see if you're getting these PCM chunks back as fast as advertised.

prisma echo
#

So this uses a VPN?

wintry grotto
#

No, this just uses your straight connection on your computer.

prisma echo
#

I was confused earlier, I do make use of the input stream. As in I send the initial text chunk. Apologize for getting confused earlier

#

And looking at my input I see that from when I implemented it as the bos message docs may have changed so going to try and update it

#

You wait for a response from the 1st send then send eos?

#

I did make ammendments such that the bos message now has the Try Trigger Generation and GenerationConfig

#

Seems like you

Send BOS Message
Wait For Reponse
Get a response of IDK?
Then Send EOS MEssage

Then get chunks?

#

No wait you do
Send BOS
Send Input
Wait For a Response?
Send EOS

Wait For Chunks Responses?

prisma echo
#

I saw a few room for improvements/modifications but they have not helped change latency yet

#

Calling EOS after all chunks are received or calling EOS after 1st chunk same latency issue

#

I also only create the websocket when I need to send text input then close it when I get all the chunks

#

So i'm trying to match the python code exactly

wintry grotto
prisma echo
wintry grotto
#

it would just stay open and you might not get all your chunks back. Part of how it knows to finish the generation is when you've instructed it that you've sent all the text and it's free to generate the remainder that are in the buffer.

prisma echo
#

Wait a minute totally done to me. Says I got all my chunks and I'm done but what you just said makes it sound like I just sent it the input text and I'm done. Do you know which it is?

wintry grotto
#

Well, that's exactly what I'm doing, is I'm sending the text and then I'm sending the end of speech in my simplified example if I'm not actually streaming the text in. There's really no reason to not send the EOS as soon as you sent the input text.

#

Like we had discussed on that call earlier, in the script I showed you, change line 10 from info to debug and then run the script again and you'll see the output in the console exactly what is happening when the WebSocket library and when the EOS message is being sent.

#

The purpose of the script is to help people understand exactly how the WebSocket API works. You can turn on debug mode to see the entire process in full detail, allowing you to understand each step of the way. That's essentially why I developed it, to address the confusion surrounding this topic.

#

@prisma echo I looked a little bit more into Unity. What library are you using for WebSockets? The default included library or a third-party library?

prisma echo
#

After going through it and kind of refreshing my mind, I can better walk you through what I'm doing as a comparison to what you're doing.

#

Think they pretty much match at the moment but my results were the same as before

wintry grotto
#

Ah crap, for some reason I thought you were using Unity. Hold on, let me rethink this for a second.

#

Okay, apparently accomplishing this in Unreal is actually a lot more complex and you don't have async/await like in Python.

#

I don't know much about Unreal Engine, but I did some research and it seems like using the HTTP module to get the /stream API working might be a better option than dealing with the complexity of WebSockets. I understand that you mentioned having trouble getting it to work, but with the IHTTP library in Unreal, you might be able to achieve it using a sample code or something similar.

prisma echo
#

Looking at the code it seems to do essentially what I was doing via BP.

The problem is when you get the OnResponseReceived, it'll fire once and that is it

#

I'll go ahead and entertain that exact code and see what happens

prisma echo
#

@wintry grotto That code doesn't stream in, it just gets one response back with everything.

It is still faster then my PCM websocket chunking which is unusually slow for me.

#

If I can sort out why the websocket chunking is slow for PCM that I imagine would be better.

#

Unless I start chunking my text instead and get back the chunks as a stream but IDK if that will lose context 11Labs side if I break text up like that, might not generate as well

prisma echo
#

Hey good news. It just started working

#

I don't know what I did or what change exactly made it work but it works now

pearl zenith
#

@prisma echo I am running into the same problems as you with Eleven labs, unreal engine and runtime audio importer. I also saw your messages on the runtime audio importer discord. Is there any chance we can compare the steps we take? And did you get it working with PCM or MP3 in the end?

prisma echo
pearl zenith
wintry grotto
wintry grotto
pearl zenith
wintry grotto
prisma echo
#

@wintry grotto @pearl zenith Yes streaming API isn't going to work in unreal the way 11Labs serves it up ATM.

I ended up getting websockets working.

It's usually more fine than not it seems ATM. Are their occasional flukes with web socket latency due to server usage spikes? I see folks complain from time to time.

An another alternative I have yet to try is using HTTP streaming but Unreal is the one to chunk it. Not sure if that'll have cons.

pearl zenith
# prisma echo <@751501076515258429> <@408625826893135873> Yes streaming API isn't going to wor...

I'm using websocket PCM with sample rate 14000 and there's just too much delay in between chunks causing a very annoying stutter every few words. I really don't know how to get it any better than it is, so if you have a smooth playback I'm curious what your setup is? Are you running websocket on a separate thread? Appending chunk by chunk to the streaming soundwave or buffering in between?

prisma echo
pearl zenith
#

@prisma echo so sorry to bother you with all these tags but are you sending the requests word by word or sentence by sentence?

prisma echo
#

Or I guess paragraph by paragraph

#

Can be as short or long as I need to but haven't tried anything too crazy

drifting zodiac