Server mode | ElevenLabs | Page 1

wanton gate Nov 3, 2024, 9:44 PM

#

Yes. Initially we will support OpenAI's chat completion API protocol. So you will receive messages and expect to return answers as per OAI's chat completion format.

We will publish a guide

prisma lynx Nov 4, 2024, 9:02 AM

#

Silly question but is Server Mode intended for us to integrate our own LLMs?

mossy solstice Nov 4, 2024, 2:37 PM

#

Will server mode provide a call identifier so that we can associate the LLM call with a specific call? So we can add additional context per call

wanton gate Nov 4, 2024, 11:41 PM

#

Hi yes, you will be able to pass extra_data dict at the start of the call that will be passed on to the server call

night folio Nov 8, 2024, 2:33 PM

#

Hi, I wanted to ask if we can pass custom system prompt and first sentence to LLM in converstion AI using sdk?

wanton gate Nov 9, 2024, 12:16 AM

#

Hi, this will be supported soon

flint drum Nov 13, 2024, 12:09 PM

#

In the docs here: https://elevenlabs.io/docs/developer-guides/BYOLLM, it looks like there's a user_id that gets passed in. Is that unique per user, per conversation? (We need to track some state variables over the course of a conversation. If there's a unique user_id, we can use a cache to track state.)

Separately, is a WebSockets 'Custom LLM' mode still on the cards? I'd love this with a WebSocket endpoint.

wanton gate Nov 13, 2024, 4:02 PM

#

In the BYOLLM, you can pass any data that you wish, they will be passed on to your server. We will improve the guide there.

#

Separately, is a WebSockets 'Custom LLM' mode still on the cards? I'd love this with a WebSocket endpoint.
We can, could you descrive what would you miss in the OpenAI's protocol?

flint drum Nov 13, 2024, 9:03 PM

#

Cool, thanks! So rather than passing the same data to each user, I really need to be able to identify each user/conversation separately... Is user_id generated by ElevenLabs and pre-populated?

To avoid an XY question: What I actually want to do is potentially call tools before responding with the LLM. The tools might change over the course of the conversation, (so at the start, Tool A is available, then after Tool A is called, Tool B becomes available, and so on.) One way I can think to accomplish that, is to track every conversation and store the state in a cache. (Then the next time that specific user hits the /v1/chat/completions endpoint, I can recall which tool options are available.)

Another option would be to inject unspoken metadata into the response, (e.g. { metadata: { state: "some_state", amount: "42" } }) so I can replay the state on the next request.

The way Retell handles this is by opening a WebSocket to the Custom LLM server for each conversation. Because it's a constant connection, it's easy to maintain state throughout the conversation. I'm not particularly attached to WebSockets. (It does mean you can send less data down pipe, though, since ElevenLabs might only need to send the last part of the conversation.) One challenge with that approach is that developers need to handle 'rollbacks' when responses go stale and need to be discarded, though.

wanton gate Nov 14, 2024, 1:41 AM

#

Yes rollbacks are a challenge.

So the way how we handle this is you provide us with a json custom_llm_extra_body in the first message at conversation start. This json can include user id etc.
Then at each request to your /v1/chat/completions those will be passed in elevenlabs_extra_body request body param.

flint drum Nov 14, 2024, 2:15 AM

#

Oh, that's actually really neat. Thanks! That sounds great. If we wanted to make it completely stateless: can we append metadata to custom_llm_extra_body on subsequent messages? That way we could accumulate state without having to cache it on the server. (Or does it only work on the first message?)

#

Actually, we would need to make it stateless, because if I just assign a user_id to each conversation and track the state that way, it's possible that some request goes through only to be discarded later, (i.e. rolled back), and it will pollute the state if we don't know that it has been rolled back 🤔

wanton gate Nov 14, 2024, 2:27 AM

#

Hm, I am wondering, if you provide conversation_id or we can also do in the extra_body if a subsequent request comes which does not include your previous answer, can you infer it was rolled back?

wanton gate Nov 14, 2024, 2:51 AM

#

Note, custom llm mode has now been made accessible.

flint drum Nov 14, 2024, 5:22 AM

#

👍 You're right – I can do a count of the messages, and then if it hasn't increased since the last iteration, we know it's a rollback. (I'm actually doing that with Retell, so I'm not sure why that didn't occur to me earlier.)

#

Congratulations on pushing the Custom LLM mode live!!

flint drum Nov 14, 2024, 5:56 AM

#

If I understand correctly, I need to set custom_llm_extra_body in first_message when I create an agent with /v1/convai/agents/create, is that right? (Does that mean we should create a new agent for each user that connects?) Or does something like a unique conversation_id or user_id get auto-populated?

#

Zooming back out, I think being able to send metadata updates to ElevenLabs while the call is ongoing would be ideal. Perhaps we could send a little JSON blob in the response header, with the updated metadata/state? And then if that response gets used, (i.e. not rolled back), it updates elevenlabs_extra_body or similar?

wanton gate Nov 14, 2024, 12:31 PM

#

So the custom_llm_extra_body should be send in the first message from client, we will add the support to the SDKs when creating a Conversation() it should include contextual info that is then passed to the custom LLM on each request.

mossy solstice Nov 14, 2024, 2:15 PM

#

So if we start sending custom_llm_extra_body with the first socket message, it will be added as extra info in the prompt today? Or it's only passed as data in the API call to our custom LLM?

wanton gate Nov 14, 2024, 2:20 PM

#

As a data in the API call to custom LLM. For extra info on prompts that's coming in 1-2 days

mossy solstice Nov 14, 2024, 2:21 PM

#

Awesome 😄 love the pace you guys are shipping at

wanton gate Mar 6, 2025, 4:24 AM

#

flint drum Congratulations on pushing the Custom LLM mode live!!

@flint drum We have rolled out a performance fix now. How did it go for you?

flint drum Mar 7, 2025, 11:28 AM

#

Hey! I'm sorry to say that I actually reverted back to Retell. Having a WebSocket for all the back-and-forth between the servers makes things a lot easier, especially when juggling state. Speed was also a factor. I'm curious what the performance improvements were, though. Is there a write-up or changelog I can take a peek at?

wanton gate Mar 11, 2025, 4:27 AM

#

flint drum Hey! I'm sorry to say that I actually reverted back to Retell. Having a WebSocke...

Thanks for reaching back! As part of our special security deployment of custom llm we lost the stream, so it hasn't been streamed through correctly. Now it should have the same performance as using OpenAI directly.

We provide all information you provide to us via extra_body, plus complete chat history, on top we will include metadata about the call, is anything missing?

Would love to have you back @flint drum haha, our transcription for example should be superior in speed and quality from what others have been saying: #📟│agents-chat message

#Server mode