hey there! seems this has been posted a few times, but have yet to see a solution set that applies for me. i’m building an app (a discord bot) that is cloud hosted with a powerful vm, and runs GPT-4 (8k). i am not a free user. api calls tend to take several minutes to fully complete— and occasionally i’ll get the error message attached. curious if other folks have found solution sets here, or if it truly rests entirely with openai as they scale to meet the massive demand. thank you all! would love to contribute to finding solves here, or being pointed toward solutions that i may have missed in my search.
#Incredibly slow api calls for gpt-4, gpt-3.5. Hoping to contribute to solving this!
22 messages · Page 1 of 1 (latest)
Hey! Response time varies widely sometimes during the week and can take a long time especially if you're using / expecting a lot of tokens for your request
Can you share some of your request IDs that were failing?
thank you for the response! this one is from 10s ago @dusky storm
Can you share the prompt or similar prompts used in that request?
@dusky storm can i dm or call you?
Hey jet. I just tried out GPT3.5turbo model for chatbot experience. It responded way slower than text davinci 003 , did you solve it? Mind to share me how to solve?
nothing yet, unfortunately
Is your stream parameter set to true or false?
Hey, Genius. Mind to explain what is stream parameter? I am new on GPT3.5 API.
You can set stream to be true or false. If you set it to false(which is also its default state), then you won't get a response till its fully formed/complete. Could be a reason for slow api calls.
Thanks for replying me🥺. So will this "stream" setting cost MORE tokens for each completion in a cost of speed?
for me it always depends on the size of the input, let's say I throw gpt-4 a whole python function and ask it to perform static code analysis or identify possible ways it could be abused by users. sometimes it just times out and tells me the model's overloaded, sometimes it just takes a minute or so
it seems to be random to me, if I'm calling the api at 3am or 3pm I personally don't notice a difference in how many times it tells me there's an error. pretty much only happens when I'm throwing a lot of text at it
how to set stream to be true? thanks a lot:)
in python
it’s just stream=true,
but the latency delta may be negligible (or impactful!) depending on your application; may also be helpful as a preventative measure for timeouts. unsure of the theoretical impact (if any) of default use across devs being streaming, though
https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb @spice wadi @stiff cosmos
yea, i’m currently dealing with completions taking on the order of multiple minutes 😅 but may be due to the length of input tokens
Thanks for this piece of useful information. The stream method is not useful in my opinion, except the application emulates the typing style like CHATGPT did🙂. Really thankful for the sharing.
I am not sure the inconsistent time taken for completions. But from testing time ,I guess there is peak time where most users used them. The model became very slow in response due to massive requests. It's very painful for the bad experience, especially customers based applications.