I'm getting response times around 15s from the chat completion endpoint with GPT-4 (give or take about 5 seconds). Pretty consistently in that range for 200 - 600 total tokens.
Is this standard? Of course there's some heavy duty work going on during the call, but 15 seconds is far too long for my application (and I suspect most applications).