#Organization Budget isn't a hard limit?

1 messages · Page 1 of 1 (latest)

robust pier
#

I believe its a fair assumption that if you set a budget, the request would start to fail -> rejected once you hit said limit.
However, this does not seem to be the case as my budget was reached and requests were still being sent resulting on all my leftover credits being drained.

Its not a big deal as there were only 15$ credits left. However, the damage could have been a lot worse.

Is this the expected behavior? Was I being naive to expect it to be a hard limit?

hasty aspen
#

Hi @robust pier, I'm just wondering if you were using any async apis by chance?

hasty aspen
robust pier
#

My API keys were being used in two ways. 1) Through a Python application running PydanticAI async agents exposed through Fast API. 2) With opencode https://github.com/anomalyco/opencode.

#

The main token usage at the time was through opencode

hasty aspen
#

@robust pier for transparency, I'm wondering if the nature of the async APIs meant that you had a bunch of in-flight requests where the API wasn't able to evaluate usage until they were finished, and when they were finished you went past your cap. Did you check your usage totals in the dashboard to see if it actually went over the limit, and by how much did it go over?

robust pier
robust pier
# robust pier Yes, $15 which was the remaining balance at the time.

For clarity.

The API calls that triggered going above the limits were in OpenCode usage with regular file edits.

Yes, it did indeed go over the limit until the remaining credits were drained (15$). This was reflected on my dashboard.

I did receive email notifying that my limit had been reached. (The emails weren’t immediately read. They were seen only after the fact, although they were sent at the correct time of usage)

hasty aspen
#

Unfortunately, I believe this is just the nature of these APIs. I looked into it and other users are reporting similar issues where it goes past their hard limit in the usage budget. It seems to be the same for other platforms like gemini and claude as well, as it may be an architectural issue with these kinds of APIs.

zinc raft
# robust pier I believe its a fair assumption that if you set a budget, the request would star...

"Oof, I feel this pain. 😅 We got burned by the same thing early on.

Budget limits aren't truly hard — in-flight async requests can blow past them before the API catches up. OpenAI's docs say it's a 'soft limit' but that's not obvious until you learn it the hard way.

Two things that helped us:

  1. Local rate limiting — We cap requests at our end before they even hit OpenAI
  2. Caching layer — 60%+ of our queries were repeats. Caching them means fewer API calls = less chance of overspend

Now we sleep better knowing we control the throttle, not just hoping OpenAI catches it in time.

Happy to share what we learned if useful!"