Token Cost Assistant API | OpenAI | Page 1

fathom axle · 2024-01-05T23:33:46.911Z

When I submit tool outputs in required_action status, will I be billed the complete thread context?

Hi @fathom axle. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

Yes, when you make a call to the API, you are billed for all the tokens in the API call, which includes the tokens in the system prompt, the conversation history, and the new messages. This is because the API is stateless and doesn't have a memory to store the context, so you need to send it with every request (source (https://community.openai.com/t/405972)).

However, the Assistants API automatically manages the context window such that you never exceed the model's context length. Once the size of the Messages in a Thread exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages (source (https://platform.openai.com/docs/assistants/how-it-works)).

If you want to reduce the cost, you can try to make your context as short as possible. If your use case requires a more advanced level of control, you can manually generate summaries and control context with the Chat Completion API (https://platform.openai.com/docs/guides/text-generation/chat-completions-api) (source (https://platform.openai.com/docs/assistants/how-it-works)).

Please note that OpenAI is planning to add the ability for users to control the input/output token count beyond the model they select, but this feature is not available yet (source (https://community.openai.com/t/534869)).

#Token Cost Assistant API

Please note that OpenAI is planning to add the ability for users to control the input/output token count beyond the model they select, but this feature is not available yet (source (https://community.openai.com/t/534869)).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: