#openai.ChatCompletion.create so slow, and improve rate limit

12 messages · Page 1 of 1 (latest)

median jewel
#

I am asking a prompt to ChatGPT. When I ask it from the web app, it takes 1 second. https://chat.openai.com/
When I ask this same prompt from the python API, it takes minutes, and sometimes it fails after 10 min! Why is the API so much slower than the website?
Also, I need to make quite a lot of queries. Is there a way that the queries can be run faster, even if by paying more? and what about the rate limit, to run queries in parallel? many thanks, David

tender leaf
#

You might be doing something wrong if it takes 10 minutes. I did speed tests recently and consistently clock between 1 and 3 seconds for sub 1k token requests.

Regarding timeout, use a library to automatically retry w/ exponential backoff

pine bobcat
median jewel
#

with gpt-3.5-turbo, it takes 100 seconds in average when using the API, and it takes 1 second when using the web app. with with gpt-3.5-turbo-16, it takes 10 min using the API.

#

i am not using the streaming mode. i am using openai.ChatCompletion.create.

tender leaf
#

Post a sample call. Perhaps you like cranked the n parameter to some crazy number?

pine bobcat
#

It shouldn't be taking 100 sec even if you're making a request with max possible tokens.

median jewel
pine bobcat
#

No, limiting max tokens would only help if the original response was supposed to be higher.
Anyways, there's something wrong with your setup. it shouldn't be taking 100 seconds no matter what.

median jewel
#

ok, i'll check in another computer and internet connection, thx

#

and running a query on gpt-3.5-turbo-16k is slower to run than gpt-3.5-turbo, if the response length is the same?

pine bobcat
#

I think so. But I haven't tested it, so not sure.