#tokens-calculation in my assistant_api

1 messages · Page 1 of 1 (latest)

versed oasis Feb 22, 2024, 9:38 PM

i have some issues regarding the tokens-calculation in my assistant_api and i need some help urgently!

I have created an assistant based on the GPT-4-1106-preview model with custom knowledge base (currently two pdf files).
Until now i did not have any issues, but since last month, the tokens-calculation regarding my input (question) got so drasticly high that it cost me way too much money.

My normal use case is that i enter an input of max. 150 words and get a response of max. 500 words.
Until now, whenever i asked a question with one sentence (4-5 words) and got the response of like 350-400 words the amount of used tokens was round about 700-900 tokens, which is perfectly fine!

Now if i ask the same question with the same 4-5 words, it counts the input_prompt lenght up to over 30.000 tokens !!! But the output tokens are still calculated accordingly !
So the main problem is that the input_prompt somehow adds thousends of tokens to the actuall question_prompt!

strong vortexBOT Feb 22, 2024, 9:38 PM

Hi @versed oasis. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

The token usage in the Assistant API can be influenced by several factors. Here are some key points based on the information extracted from the sources:

Token Calculation: Both input and output tokens count toward the total tokens used. For example, if your API call used 10 tokens in the message input and you received 20 tokens in the message output, you would be billed for 30 tokens. However, it's important to note that the token calculation is not solely based on the input and output messages. The Assistant API also uses tokens for internal functions like retrieval and other tools, which can add to the total token count (source (https://platform.openai.com/docs/guides/text-generation)).
Message Formatting: Chat models like gpt-3.5-turbo and gpt-4-turbo-preview use tokens in the same way as the models available in the completions API, but because of their message-based formatting, it's more difficult to count how many tokens will be used by a conversation. Each message passed to the API consumes the number of tokens in the content, role, and other fields, plus a few extra for behind-the-scenes formatting (source (https://platform.openai.com/docs/guides/text-generation)).
Token Overhead: The chat completions messages are wrapped in a container of tokens along with the name of the role, and in addition, the AI is prompted with more tokens where it is supposed to write its answer. That gives a token overhead of 7 tokens per first message, and 4 for each additional (source (https://community.openai.com/t/595970)).

Retrieval Strategy: The Assistant API currently optimizes for quality by adding all relevant content to the context of model calls. This means that all content that fits within the model’s context length is considered, which can significantly increase the token usage (source (https://community.openai.com/t/595953)).
Billing: The tokens used for the Assistant API are billed at the chosen language model’s per-token input/output rates. You get billed by the tokens in addition to the per day file retrieval costs. And with the Assistant API, you have no control over how much it decides to retrieve (source (https://community.openai.com/t/553660)).

Given these factors, it's possible that the increase in token usage you're experiencing is due to the Assistant API's retrieval strategy and the token overhead associated with message formatting. However, without more specific information about your use case, it's difficult to pinpoint the exact cause.

You might want to check the usage field in the API response (e.g., response**'usage']['total_tokens']) to see how many tokens are used by an API call. If you want to see how many tokens are in a text string without making an API call, you can use OpenAI’s [tiktoken** (https://github.com/openai/tiktoken) Python library (source (https://platform.openai.com/docs/guides/text-generation)).

#tokens-calculation in my assistant_api

If the token usage remains unexpectedly high, it would be best to reach out to OpenAI support for further assistance.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: