I am asking a prompt to ChatGPT. When I ask it from the web app, it takes 1 second. https://chat.openai.com/
When I ask this same prompt from the python API, it takes minutes, and sometimes it fails after 10 min! Why is the API so much slower than the website?
Also, I need to make quite a lot of queries. Is there a way that the queries can be run faster, even if by paying more? and what about the rate limit, to run queries in parallel? many thanks, David
#openai.ChatCompletion.create so slow, and improve rate limit
12 messages · Page 1 of 1 (latest)
You might be doing something wrong if it takes 10 minutes. I did speed tests recently and consistently clock between 1 and 3 seconds for sub 1k token requests.
Regarding timeout, use a library to automatically retry w/ exponential backoff
What model are you using in the API calls, David? And are you using the streaming mode?
with gpt-3.5-turbo, it takes 100 seconds in average when using the API, and it takes 1 second when using the web app. with with gpt-3.5-turbo-16, it takes 10 min using the API.
i am not using the streaming mode. i am using openai.ChatCompletion.create.
Post a sample call. Perhaps you like cranked the n parameter to some crazy number?
It shouldn't be taking 100 sec even if you're making a request with max possible tokens.
i use max_tokens=3000.
using a lower max_tokens would be faster, even of the response is let's say 100 tokens?
No, limiting max tokens would only help if the original response was supposed to be higher.
Anyways, there's something wrong with your setup. it shouldn't be taking 100 seconds no matter what.
ok, i'll check in another computer and internet connection, thx
and running a query on gpt-3.5-turbo-16k is slower to run than gpt-3.5-turbo, if the response length is the same?
I think so. But I haven't tested it, so not sure.