#Slow API - Being charged for exponential backoff retries?

2 messages · Page 1 of 1 (latest)

vernal quail
#

I just tested, I set up a loop that ran 50 requests with ~3k tokens via fetch in node, aborted using AbortController after 1 second, only 1 went through without aborting - but I got charged for (almost) all 50 - meaning any latency "fix" using exponential backoff is probably costing $ on credit usage on each retry - I kinda understand since the model was working during that time - but makes it unscalable for high usage when the model is responding slowly - literally doubling or tripling token usage..... I will repeat the test later in more advanced test-harness to double check this result, but wanted to check if this is accurate?

#

Part of the more advanced test harness would involve checking with wireshark that node does actually kill the HTTP connection (the internet says it does when using fetch and AbortController) - but if you tell me this is how the charging model behaves then there's no need for the effort on my part 🙂