#Token Cost Assistant API
1 messages · Page 1 of 1 (latest)
Hi @fathom axle. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Yes, when you make a call to the API, you are billed for all the tokens in the API call, which includes the tokens in the system prompt, the conversation history, and the new messages. This is because the API is stateless and doesn't have a memory to store the context, so you need to send it with every request (source (https://community.openai.com/t/405972)).
However, the Assistants API automatically manages the context window such that you never exceed the model's context length. Once the size of the Messages in a Thread exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages (source (https://platform.openai.com/docs/assistants/how-it-works)).
If you want to reduce the cost, you can try to make your context as short as possible. If your use case requires a more advanced level of control, you can manually generate summaries and control context with the Chat Completion API (https://platform.openai.com/docs/guides/text-generation/chat-completions-api) (source (https://platform.openai.com/docs/assistants/how-it-works)).