#Rate limit

16 messages · Page 1 of 1 (latest)

glacial oriole
#

Hi - another beginner question that I think I understand but wanted clarification. One of the main use cases for us is using multiple LLM's as failover for each other since gpt's rate limits hinder our workflows. I'm trying to understand how this works in OR:

  1. I believe we can pay per usage through OR without having to get our own API keys directly from the LLM correct?
  2. If I have higher rate limits on my own openai account, should I use my own key instead of OR for openai specifically or would throughput be the same?
  3. Does OR have higher rate limits with each LLM compared to their baseline rate limits they'd provide end users directly?
  4. Does failover between models just go in order of the list of models you list in the API call if you've called out wanting failover or what order does it go in? I assume no failover models will be used if the previous model didn't time out?

Thanks for the guidance!

golden fiber
#

1 - Yes
2 - Throughput likely similar
3 - Docs on rate limit: https://openrouter.ai/docs#limits
4 - Fallback router is only triggered when the target model got flagged/moderated by upstream. We will consider timeout/error soon - good suggestion! cc @hushed torrent

OpenRouter

Authenticate and use your AI models in one place

glacial oriole
#

@golden fiber thanks for the feedback! I'm currently using another product edenai because it handles failover for timeouts/errors typically due to rate limits. Sounds like your native API call won't use fallbacks for timeouts/errors which is my main use case. Any other way to achieve this with openrouter or better to stick with edenai for now?

golden fiber
#

handles failover for timeouts/errors typically due to rate limits

@glacial oriole can you elaborate a bit on how they handle this?

golden fiber
#

cc @hushed torrent

hushed torrent
#

Yeah this is great feedback, a pretty clear next step for our fallback router

#

@glacial oriole we'll work on this and get back to you soon

glacial oriole
#

@hushed torrent @golden fiber They just do sequential fallback based on your fallback providers you've included so if provider A fails, goes to provider B and so on. Here's a screenshot of their explanation. I also use the tool so if you need to see anything else happy to share!

hushed torrent
#

How can Amazon be an OpenAI provider?

#

which models are being used in the openai/microsoft/amazon provider cases?

glacial oriole
#

@hushed torrent So Amazon isn't an openai provider but it's a fallback provider. So in this eample it'll try openai, if it fails it goes to fallback providers microsoft, if fails go to amazon. You can fallback to any provider. And you can call out which model you want to use from each provider. So for openai I tell them I want to run 3.5 turbo, for google I'm using chat-bison, etc.

hushed torrent
#

oh and on each provider it looks like you select a different model

#

for openrouter it works a bit differently - you specify a list of models that you want to fall back to

#

on our end, we're figuring out the best providers for developers. We do fallbacks today for 429s and 500s for the open source models, but will do fallbacks too for the closed models in the future

glacial oriole
#

Makes sense, yeah you wouldn't necessarily need to call out a specific provider and instead could call out specific list of models to fall back to i.e. gpt 3.5 turbo primary and chat-bison (google) fallback as example. Here's their documentation if this helps guide you as you roll this out in the future. https://docs.edenai.co/reference/text_chat_create

Eden AI

Available Providers NameVersionPriceBilling unitopenai - gpt-3.5-turbov1Beta0.002 (per 1000 token)1 tokenopenai - gpt-4-0314v1Beta0.06 (per 1000 token)1 tokenopenai - gpt-4v1Beta0.06 (per 1000 token)1 tokenopenai - gpt-3.5-turbo-0301v1Beta0.002 (per 1000 token)1 tokenopenaiv1Beta0.002 (per 1000 toke...

#

Ohh ok so you already do have this feature for 429s (rate limits) and 500s (timeouts) but only for open source models and not for all models like gpt, etc.?