The automatic switching of LLM is | ElevenLabs | Page 1

night hare Apr 3, 2025, 11:36 AM

#

Hi Alexander, what is your "high-intelligence" model? It is true we fallback with Gemini by default. Somewhat soon we're planning to give users the ability to customize which LLMs should we fallback to

#

however, one thing to note is that the fallbacks should be rare. Perhaps there is another underlying issue affecting your conversations, would you mind sharing an example Conv ID where the problem happened? This will help me investigate in detail

#

In general, yes, the fallback mechanism is imperfect. It is quite difficult to recover from failures mid-message generation, which is unfortunately possible with most LLM providers. Anecdotically > 90% of failures occur atomically (it's not possible to generate any tokens with a given LLM), but for the remaining ones there could be some subtle issues unfortunately.

#

I'd be happy to chat more about this if you're available

#

especially if you have concrete conversation IDs where things went wrong that would really really help

gilded coral Apr 3, 2025, 12:29 PM

#

night hare Hi Alexander, what is your "high-intelligence" model? It is true we fallback wit...

Hi Kacper. Sonnet 3.5, Sonnet 3.7, or GPT-4o are sufficiently capable for our requirements. We are opting for Sonnet 3.7 currently.

Conversation ids:
eQoAY7HILxlOFI4lCgMC - example of long silence
kkEpi08q27Chk3rRd6q1 - example of Gemini not reasoning well, saying "The viewing is not booked or confirmed until I have this information." to the user.
5NSEDTh7F9ru1vIpysSl - example of Gemini reading UUIDs

Looking at our call history, there were 18 calls in a production context taken yesterday. 6 were immediate disconnections by user, so that's 12 customer or end-customer facing calls. Of the 12, only 1 didn't fallback to using Gemini.

If we can crack these issues, our market is going to go wild for this product. Right now though we're facing pressure that the solution is not reliable enough.

Our usage target environment is telephony, so we are able to consider UX workarounds, such as having the agent say "Sorry, could you bear with me a moment please" or "Sorry, I lost you there, what did you say?" as natural methods of recovering from technical errors happening beneath the hood. Humans do this all the time. Approaches such as this may deal with complex failure cases such as the one you describe where failures occur mid-stream. As we have a socket on the conversation, we have custom layers above ElevenLabs where we are able to implement robustness ourselves where needed, providing ElevenLabs gives us enough control to do so. At the moment, I don't believe we have any control over the fallback behaviour. From a cursary glance at the documentation, nor does it appear we can detect when a fallback LLM is being used in real-time, or, any event information on error.

Happy to discuss/chat about this with you. I am in the British Summer time zone (BST - GMT+1).

night hare Apr 3, 2025, 12:30 PM

#

gilded coral Hi Kacper. Sonnet 3.5, Sonnet 3.7, or GPT-4o are sufficiently capable for our re...

Are you in London?

gilded coral Apr 3, 2025, 12:31 PM

#

night hare Are you in London?

Nottingham, UK. I am often in London and can arrange meetings there at natural opportunities to do so, or can make specific arrangements.

night hare Apr 3, 2025, 12:32 PM

#

I see. I was just curious, I'm in the UK as well

#

anyway back to the topic

#

thanks a lot for the conversations, I'll see what I can do for you there 🙂

gilded coral Apr 3, 2025, 12:32 PM

#

I appreciate your time.

night hare Apr 3, 2025, 12:33 PM

#

customer choice of fallback LLMs should be quite easy and we've been considering this for a while

#

solving the mid-message transition issues is approximately 100x harder, but I'll try to look into your conversation logs and see if it is that problem (most likely not, this should be very very rare)

gilded coral Apr 3, 2025, 12:35 PM

#

Thank you. There's a conversation in the history somewhere where we can see both LLMs responding, which might be the specific technical issue you are describing, but I couldn't find it just now.

#

In terms of resolution, it may be that being able to fall back to a Sonnet 3.5, or a GPT-4o, solves this for us.

night hare Apr 3, 2025, 12:36 PM

#

I strongly suspect there is some other underlying issue causing your problems

gilded coral Apr 3, 2025, 12:36 PM

#

I would be interested in an error event on the socket.

night hare Apr 3, 2025, 12:36 PM

#

let me dig into this 🙂

gilded coral Apr 3, 2025, 12:36 PM

#

FYI we have custom audio processing infront of you

#

and custom audio and transcription processing after you

#

Our solution stack is reasonably developed and given control, we can dig ourselves out of problems without troubling you.

night hare Apr 3, 2025, 12:37 PM

#

It should not be necessary, this should just work 🙂

gilded coral Apr 3, 2025, 12:38 PM

#

Of course. Thank you. As you will appreciate, sometimes stakeholder pressure requires same-day answers.

#

Control in our hands gives me recourse, and is always useful. But this is an aside.

night hare Apr 3, 2025, 12:38 PM

#

but we are making progress with highlighting error messages, there was a PR merged today which should highlight crashes much better

gilded coral Apr 3, 2025, 12:39 PM

#

I have been following your progress since December. Thank you for the work. I will let you get back to your day, and I look forward to your feedback.

night hare Apr 3, 2025, 12:42 PM

#

Oki, so for the first example conversation Claude 3.7 is bouncing us with "Overloaded" error message 😦

#

would ability to choose fallback LLM help here?

#

in this example no matter how many times we send a request to Anthropic it's gonna fail

gilded coral Apr 3, 2025, 12:43 PM

#

This follows. We don't have the data to be sure, but we believe we see an increased rate of failure at peak times of day.

night hare Apr 3, 2025, 12:43 PM

#

that's why we try different providers when this happens

gilded coral Apr 3, 2025, 12:44 PM

#

Indeed. So, my thoughts are that Claude 3.5 may be a different availability group. GPT-4o is a different provider.

#

Both of these as fallback LLMs have sufficient model intelligence to support our use case.

night hare Apr 3, 2025, 12:45 PM

#

oki, I'll try to implement fallback choice soon! 🙂

gilded coral Apr 3, 2025, 12:45 PM

#

In terms of the bullet proof solution, I would personally have an error event on the socket, or a critical error event on the socket.

#

We would disconnect Eleven labs

#

we would tell the user "one moment please"

#

we would then reconnect and retry

#

If, due to circumstances out of our control, we experience total failure, we can redirect the call

#

Alongside your cascade mechanisms, and choice of model, I believe this would be bullet proof.

night hare Apr 3, 2025, 12:46 PM

#

ideally on our side we limit total failures to 0.000001 % so that you don't have to worry about this 🙂

gilded coral Apr 3, 2025, 12:48 PM

#

Thank you. If there are any indications of any reasons for our high incident rate of falling back, please do let me know. Have a good afternoon.

night hare Apr 3, 2025, 1:24 PM

#

To resolve long silence issues, you may wish to decrease "turn timeout" value from 15s to something smaller

#

for the first conversation provided, the long pause happens because LLM said something, and then it was the caller's turn to say something

#

only when it timed out after 15s (which is probably too much, our default is 7s), the agent could follow up

#

Have a look at prompting best practices, too
https://elevenlabs.io/docs/conversational-ai/best-practices/prompting-guide

Conversational AI voice agent prompting guide | ElevenLabs Document...

Learn how to engineer lifelike, engaging Conversational AI voice agents

#

Having a well-structures prompt with the list of tools at the bottom could help reduce LLM confusion as well

night hare Apr 3, 2025, 2:00 PM

#

Claude 3.7 seems overloaded way too often. Could you please try with GPT-4 or Claude 3.5? They should be more reliable. I'll see what we can do with C 3.7

night hare Apr 3, 2025, 2:28 PM

#

@mint bridge

gilded coral Apr 8, 2025, 12:49 PM

#

night hare To resolve long silence issues, you may wish to decrease "turn timeout" value fr...

Thank you. Our customer said this delay was 90 seconds, but I can see it is around 17-18 seconds in actuality. This has been reported because, it was the fall-back LLM that had said something which did not make sense to the user, so they held, expecting something to happen.

Our turn time out has been set at 15 seconds as a workaround to an issue, where, the turn-time out applied even when it is the agents turn to speak. So for example, if the time-out was 10 seconds, and it took the agent 8 seconds to speak its turn, the user would only have 2 seconds before the timeout occured. Or, furthermore, if the agent took longer than 10 seconds to read its turn, the turn-timeout would immediately be buffered. Do you know if this issue has been fixed?

gilded coral Apr 8, 2025, 12:52 PM

#

night hare Have a look at prompting best practices, too https://elevenlabs.io/docs/conversa...

Duly noted, and we will experiment with our system prompt, though we are not having any issues with LLM reasoning and capability providing our agent is using the expected LLM.

night hare Apr 9, 2025, 8:52 AM

#

@gilded coral the turn timeout is counted since the agent finished speaking

#The automatic switching of LLM is