#Eleven-Labs-Benchmarking/elevenlabs_webs...
1 messages · Page 1 of 1 (latest)
What exactly does this test test?
This is the test that you saw me running. It measures the amount of time for each part of the process. It shows you the chunks in real time being printed on your console, measures the latency between chunks, and the total amount of time the WebSocket is being run.
It takes away everything except for the raw response back from the server so there's really no other factors that could potentially be contaminant in the test results other than your network path to the server.
Does it benchmark streaming input, output or both?
You can actually have it do anything and everything you want. I know it's a little confusing because I didn't really share this with anyone, but you can use the text chunker, which simulates an LLM input stream, or you can use where it says text chunk and you could send it all at once by using basically a delay time, which is the default of 0.0001.
If you have any trouble, I can get on a voice call and explain it with you, but that's basically what you need to know.
But I think this is a worthwhile test for you to run because that way you can see you can eliminate your specific network or location and see if you're getting these PCM chunks back as fast as advertised.
So this uses a VPN?
No, this just uses your straight connection on your computer.
I was confused earlier, I do make use of the input stream. As in I send the initial text chunk. Apologize for getting confused earlier
And looking at my input I see that from when I implemented it as the bos message docs may have changed so going to try and update it
You wait for a response from the 1st send then send eos?
I did make ammendments such that the bos message now has the Try Trigger Generation and GenerationConfig
Seems like you
Send BOS Message
Wait For Reponse
Get a response of IDK?
Then Send EOS MEssage
Then get chunks?
No wait you do
Send BOS
Send Input
Wait For a Response?
Send EOS
Wait For Chunks Responses?
I saw a few room for improvements/modifications but they have not helped change latency yet
Calling EOS after all chunks are received or calling EOS after 1st chunk same latency issue
I also only create the websocket when I need to send text input then close it when I get all the chunks
So i'm trying to match the python code exactly
EOS always comes when you're totally done and ready to close the connection.
What would happen if you just call WebSocket clothes without calling EOS?
Anyways, I did try that order and technically have improved my code but it's still performs exactly the same
it would just stay open and you might not get all your chunks back. Part of how it knows to finish the generation is when you've instructed it that you've sent all the text and it's free to generate the remainder that are in the buffer.
Wait a minute totally done to me. Says I got all my chunks and I'm done but what you just said makes it sound like I just sent it the input text and I'm done. Do you know which it is?
Well, that's exactly what I'm doing, is I'm sending the text and then I'm sending the end of speech in my simplified example if I'm not actually streaming the text in. There's really no reason to not send the EOS as soon as you sent the input text.
Like we had discussed on that call earlier, in the script I showed you, change line 10 from info to debug and then run the script again and you'll see the output in the console exactly what is happening when the WebSocket library and when the EOS message is being sent.
The purpose of the script is to help people understand exactly how the WebSocket API works. You can turn on debug mode to see the entire process in full detail, allowing you to understand each step of the way. That's essentially why I developed it, to address the confusion surrounding this topic.
@prisma echo I looked a little bit more into Unity. What library are you using for WebSockets? The default included library or a third-party library?
I'm using Unreal and I'm using the native websocket code provided in Unreal Engine.
Pretty sure that ive adjusted my code to match exactly the same calls that you're doing so far.
After going through it and kind of refreshing my mind, I can better walk you through what I'm doing as a comparison to what you're doing.
Think they pretty much match at the moment but my results were the same as before
Ah crap, for some reason I thought you were using Unity. Hold on, let me rethink this for a second.
Okay, apparently accomplishing this in Unreal is actually a lot more complex and you don't have async/await like in Python.
I don't know much about Unreal Engine, but I did some research and it seems like using the HTTP module to get the /stream API working might be a better option than dealing with the complexity of WebSockets. I understand that you mentioned having trouble getting it to work, but with the IHTTP library in Unreal, you might be able to achieve it using a sample code or something similar.
Hmm where did you find that?
Looking at the code it seems to do essentially what I was doing via BP.
The problem is when you get the OnResponseReceived, it'll fire once and that is it
I'll go ahead and entertain that exact code and see what happens
@wintry grotto That code doesn't stream in, it just gets one response back with everything.
It is still faster then my PCM websocket chunking which is unusually slow for me.
If I can sort out why the websocket chunking is slow for PCM that I imagine would be better.
Unless I start chunking my text instead and get back the chunks as a stream but IDK if that will lose context 11Labs side if I break text up like that, might not generate as well
Hey good news. It just started working
I don't know what I did or what change exactly made it work but it works now
@prisma echo I am running into the same problems as you with Eleven labs, unreal engine and runtime audio importer. I also saw your messages on the runtime audio importer discord. Is there any chance we can compare the steps we take? And did you get it working with PCM or MP3 in the end?
What's your issue? As far as I can tell only the PCM solution will work with unreal for streaming.
If you don't care about streaming you can get mp3 working as well.
No I am using websockets because I need streaming indeed! And I was getting this weird stutter with MP3s so I am now streaming with PCM format. There is a little stutter still occasionally.. Way less now with PCM but it can be annoying
Wow. That's odd but glad to hear!!
Unless you need "input streaming" better off with the regular /stream API for various reasons.
Yes I thought so too but it seems Unreal's IHTTP does not support streaming APIs, it only returns the payload once the response is entirely complete. So it would be great to have an indication of what your process is @prisma echo, I'm beginning to think it might be an issue with the way I'm using Runtime Audio Importer
If he use the code I shared maybe the answer lies there if that's what he got working. I'm wondering if for you unreal guys it would be easier to have some sort of local running proxy that could rebroadcast the chunks in a way that's more compatible.
@wintry grotto @pearl zenith Yes streaming API isn't going to work in unreal the way 11Labs serves it up ATM.
I ended up getting websockets working.
It's usually more fine than not it seems ATM. Are their occasional flukes with web socket latency due to server usage spikes? I see folks complain from time to time.
An another alternative I have yet to try is using HTTP streaming but Unreal is the one to chunk it. Not sure if that'll have cons.
I'm using websocket PCM with sample rate 14000 and there's just too much delay in between chunks causing a very annoying stutter every few words. I really don't know how to get it any better than it is, so if you have a smooth playback I'm curious what your setup is? Are you running websocket on a separate thread? Appending chunk by chunk to the streaming soundwave or buffering in between?
I'm using Blueprint Websocket from unreal marketplace. It seems they are just using unreal's native web socket support. Idk what you might be using. I also had the same issue.
Last thing I did was try and copy the benchmark python code exactly.
@prisma echo so sorry to bother you with all these tags but are you sending the requests word by word or sentence by sentence?
I send sentence by sentence currently aka as long of a text message as I want.
Or I guess paragraph by paragraph
Can be as short or long as I need to but haven't tried anything too crazy
Hi, I'm facing the same issue. Have you solved this? Would be grateful for some tips on fixing it