I use the openai library to access ChatCompletition (with model gpt-3.5-turbo). In my application I am obliged to respond to the user in less than 8 seconds and that is why I use stream=True.
In the last few days I noticed an increase in the response time (mainly since April 25th and the "New ways to manage your data in ChatGPT" announcement). The main problem for my application is that the number of incomplete responses has increased a lot (if the response reaches 8 seconds I send it as it is to the final user, just letting them know that I could not collect the entire response). And that will probably make many users stop using the application.
Is it really slower? If so, does OPENAI have any kind of solution?
Application: Portuguese skill for Alexa
Environment: Python 3.7 and openai 0.27.0