Using the v1/completions api, we are getting rate limited? We have a prod application where we make thousands of calls a day about 3800 requests per day, but some how we are getting rate limited. We have looked at our numbers and we are nowhere close to the 3500 rpm or even the token limit. We have requested for a rate limit increase but we doubt that would solve the problem. Does anyone have insights into this? Thanks
#Rate limiting
8 messages · Page 1 of 1 (latest)
Is it intermittent? If so, it could just be the endpoint is overloaded with requests from everyone.
Thank you, i discovered that in another thread. 429 seems to be misleading though. When the server is overloaded they should be responding with http status code in the 500s
Lately it has been frequent, we are seeing 30 to 40% error rate daily
So this is a general overload? Why doesn't the status page show anything? It's been like this for the whole day.
We've been having the same issue t with davinci-003 and its says:
Error: Request failed with status code 429
statusMessage: 'Too Many Requests',
'x-ratelimit-limit-requests': '3000',
'x-ratelimit-remaining-requests': '2999',
'x-ratelimit-limit-tokens': '250000',
'x-ratelimit-remaining-tokens': '249500',
'x-request-id': '1433e1c62ba68cb51ae9a81d9875d808',
'x-ratelimit-reset-requests': '20ms',
'x-ratelimit-reset-tokens': '120ms',
Paid API account, critical level issue in production environment! 😬
Have you come across any workarounds? I'm faced with the same problem
Migrated to Azure and ran a test with the same task.
Azure - 39 sec
OpenAI - 1 min 31 sec
Azure is almost 2/3 faster and hopefully doesn't have the general rate limit problems. One thing to keep in mind is that RPM and TPM is a lot lower for users than OpenAI.
so you are using Azure to generate responses rather than OpenAI? How is the quality