Hi - another beginner question that I think I understand but wanted clarification. One of the main use cases for us is using multiple LLM's as failover for each other since gpt's rate limits hinder our workflows. I'm trying to understand how this works in OR:
- I believe we can pay per usage through OR without having to get our own API keys directly from the LLM correct?
- If I have higher rate limits on my own openai account, should I use my own key instead of OR for openai specifically or would throughput be the same?
- Does OR have higher rate limits with each LLM compared to their baseline rate limits they'd provide end users directly?
- Does failover between models just go in order of the list of models you list in the API call if you've called out wanting failover or what order does it go in? I assume no failover models will be used if the previous model didn't time out?
Thanks for the guidance!