#The automatic switching of LLM is
1 messages ยท Page 1 of 1 (latest)
Hi Alexander, what is your "high-intelligence" model? It is true we fallback with Gemini by default. Somewhat soon we're planning to give users the ability to customize which LLMs should we fallback to
however, one thing to note is that the fallbacks should be rare. Perhaps there is another underlying issue affecting your conversations, would you mind sharing an example Conv ID where the problem happened? This will help me investigate in detail
In general, yes, the fallback mechanism is imperfect. It is quite difficult to recover from failures mid-message generation, which is unfortunately possible with most LLM providers. Anecdotically > 90% of failures occur atomically (it's not possible to generate any tokens with a given LLM), but for the remaining ones there could be some subtle issues unfortunately.
I'd be happy to chat more about this if you're available
especially if you have concrete conversation IDs where things went wrong that would really really help
Hi Kacper. Sonnet 3.5, Sonnet 3.7, or GPT-4o are sufficiently capable for our requirements. We are opting for Sonnet 3.7 currently.
Conversation ids:
eQoAY7HILxlOFI4lCgMC - example of long silence
kkEpi08q27Chk3rRd6q1 - example of Gemini not reasoning well, saying "The viewing is not booked or confirmed until I have this information." to the user.
5NSEDTh7F9ru1vIpysSl - example of Gemini reading UUIDs
Looking at our call history, there were 18 calls in a production context taken yesterday. 6 were immediate disconnections by user, so that's 12 customer or end-customer facing calls. Of the 12, only 1 didn't fallback to using Gemini.
If we can crack these issues, our market is going to go wild for this product. Right now though we're facing pressure that the solution is not reliable enough.
Our usage target environment is telephony, so we are able to consider UX workarounds, such as having the agent say "Sorry, could you bear with me a moment please" or "Sorry, I lost you there, what did you say?" as natural methods of recovering from technical errors happening beneath the hood. Humans do this all the time. Approaches such as this may deal with complex failure cases such as the one you describe where failures occur mid-stream. As we have a socket on the conversation, we have custom layers above ElevenLabs where we are able to implement robustness ourselves where needed, providing ElevenLabs gives us enough control to do so. At the moment, I don't believe we have any control over the fallback behaviour. From a cursary glance at the documentation, nor does it appear we can detect when a fallback LLM is being used in real-time, or, any event information on error.
Happy to discuss/chat about this with you. I am in the British Summer time zone (BST - GMT+1).
Are you in London?
Nottingham, UK. I am often in London and can arrange meetings there at natural opportunities to do so, or can make specific arrangements.
I see. I was just curious, I'm in the UK as well
anyway back to the topic
thanks a lot for the conversations, I'll see what I can do for you there ๐
I appreciate your time.
customer choice of fallback LLMs should be quite easy and we've been considering this for a while
solving the mid-message transition issues is approximately 100x harder, but I'll try to look into your conversation logs and see if it is that problem (most likely not, this should be very very rare)
Thank you. There's a conversation in the history somewhere where we can see both LLMs responding, which might be the specific technical issue you are describing, but I couldn't find it just now.
In terms of resolution, it may be that being able to fall back to a Sonnet 3.5, or a GPT-4o, solves this for us.
I strongly suspect there is some other underlying issue causing your problems
I would be interested in an error event on the socket.
let me dig into this ๐
FYI we have custom audio processing infront of you
and custom audio and transcription processing after you
Our solution stack is reasonably developed and given control, we can dig ourselves out of problems without troubling you.
It should not be necessary, this should just work ๐
Of course. Thank you. As you will appreciate, sometimes stakeholder pressure requires same-day answers.
Control in our hands gives me recourse, and is always useful. But this is an aside.
but we are making progress with highlighting error messages, there was a PR merged today which should highlight crashes much better
I have been following your progress since December. Thank you for the work. I will let you get back to your day, and I look forward to your feedback.
Oki, so for the first example conversation Claude 3.7 is bouncing us with "Overloaded" error message ๐ฆ
would ability to choose fallback LLM help here?
in this example no matter how many times we send a request to Anthropic it's gonna fail
This follows. We don't have the data to be sure, but we believe we see an increased rate of failure at peak times of day.
that's why we try different providers when this happens
Indeed. So, my thoughts are that Claude 3.5 may be a different availability group. GPT-4o is a different provider.
Both of these as fallback LLMs have sufficient model intelligence to support our use case.
oki, I'll try to implement fallback choice soon! ๐
In terms of the bullet proof solution, I would personally have an error event on the socket, or a critical error event on the socket.
We would disconnect Eleven labs
we would tell the user "one moment please"
we would then reconnect and retry
If, due to circumstances out of our control, we experience total failure, we can redirect the call
Alongside your cascade mechanisms, and choice of model, I believe this would be bullet proof.
ideally on our side we limit total failures to 0.000001 % so that you don't have to worry about this ๐
Thank you. If there are any indications of any reasons for our high incident rate of falling back, please do let me know. Have a good afternoon.
To resolve long silence issues, you may wish to decrease "turn timeout" value from 15s to something smaller
for the first conversation provided, the long pause happens because LLM said something, and then it was the caller's turn to say something
only when it timed out after 15s (which is probably too much, our default is 7s), the agent could follow up
Have a look at prompting best practices, too
https://elevenlabs.io/docs/conversational-ai/best-practices/prompting-guide
Learn how to engineer lifelike, engaging Conversational AI voice agents
Having a well-structures prompt with the list of tools at the bottom could help reduce LLM confusion as well
Claude 3.7 seems overloaded way too often. Could you please try with GPT-4 or Claude 3.5? They should be more reliable. I'll see what we can do with C 3.7
@mint bridge
Thank you. Our customer said this delay was 90 seconds, but I can see it is around 17-18 seconds in actuality. This has been reported because, it was the fall-back LLM that had said something which did not make sense to the user, so they held, expecting something to happen.
Our turn time out has been set at 15 seconds as a workaround to an issue, where, the turn-time out applied even when it is the agents turn to speak. So for example, if the time-out was 10 seconds, and it took the agent 8 seconds to speak its turn, the user would only have 2 seconds before the timeout occured. Or, furthermore, if the agent took longer than 10 seconds to read its turn, the turn-timeout would immediately be buffered. Do you know if this issue has been fixed?
Duly noted, and we will experiment with our system prompt, though we are not having any issues with LLM reasoning and capability providing our agent is using the expected LLM.
@gilded coral the turn timeout is counted since the agent finished speaking