Hey, so I was looking at the pricing for the API page and I'm confused. As I actual have a conversation, how is the price likely to be affected? Do I keep paying for everything in the context window again with every new message as the conversation continues? If so, what's the limit? How expensive should I expect these conversations to become?
#API pricing explanation check
1 messages · Page 1 of 1 (latest)
Hi @inner ridge. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Yes, you're correct in your understanding. When you're using the OpenAI API, the cost is determined by the number of tokens processed. This includes both the input and output tokens. A token can be as short as one character or as long as one word. For example, "ChatGPT is great!" is encoded into six tokens: ["Chat", "G", "PT", " is", " great", "!"].
In the context of a conversation, each subsequent question you ask includes the entire history of your conversation. This means that with each additional question, you're essentially re-asking all of the previous questions and providing all of the previous answers, then asking the next question. This continues up to the context window of the model, at which point your application needs to start pruning the chat history, usually by removing the oldest messages and answers. This is how GPT can follow the conversation from question to question.
The cost can add up quickly, especially if you're using larger models like GPT-4. There are some ways to manage this, such as maintaining a smaller chat history, which might put a cap on the overall size of any given request. However, this will make the bot behave more "forgetful" because it won't maintain your previous answers in its history for as long.
If you're having an extended conversation and want the context of your previous questions and answers, there's no easy way to achieve this with a large language model (today) that doesn't involve this repetitive and token-intensive approach of re-processing the entire conversation with each request.
You can find the specific pricing for different models on the OpenAI Pricing page (https://openai.com/pricing). For example, for the gpt-3.5-turbo-1106 model, the cost is $0.0010 per 1K tokens for input and $0.0020 per 1K tokens for output.
To manage costs, you could consider reducing the number of tokens required by using shorter prompts, fine-tuning models, or caching common user queries so that they don't need to be processed repeatedly. You can experiment with OpenAI's interactive tokenizer tool (https://platform.openai.com/tokenizer) to help you estimate costs.
Sources:
- Forum Post: Am I doing something wrong or is the pricing extremely steep? (https://community.openai.com/t/426336)
- Forum Post: Rate limiting (Error code: 429) (https://community.openai.com/t/548748)
- Docs > Guides > Production best practices > Managing costs > Text generation (https://platform.openai.com/docs/guides/production-best-practices)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@inner ridge kapa.ai is still learning and improving, please let us know how it did by reacting below
Evaluation: All of the links work. Everything seems relevant and correct, including the forum posts picked to show as further sources.
@drowsy crystal So, about the output cost.
I understand that the previous messages are carried forward, resubmitted.
Output cost seems to be higher than input cost.
Are the model's previous outputs now treated as input for the cost, or do I pay the output cost again for those already provided outputs?
Let me check the knowledge base for you. One moment! :books:
When you're using the OpenAI API, you're charged for both the input and output tokens. The input tokens include the conversation history (both your prompts and the model's previous responses) and the new message you're sending. The output tokens are the new responses generated by the model.
So, when you carry forward the conversation history, the model's previous outputs are treated as input tokens and you're charged the input rate for them. You're not charged the output rate again for these previous responses. The output rate is only applied to the new responses generated by the model in the current API call.
For example, if you're using the gpt-3.5-turbo-1106 model, the cost is $0.0010 per 1K tokens for input and $0.0020 per 1K tokens for output. If your conversation history (including both your prompts and the model's previous responses) is 1K tokens and the model generates 1K tokens as a new response, you would be charged $0.0010 for the 1K input tokens and $0.0020 for the 1K output tokens.
Sources:
- Forum Post: Rate limiting (Error code: 429) (https://community.openai.com/t/548748)
- Docs > Guides > Production best practices > Managing costs > Text generation (https://platform.openai.com/docs/guides/production-best-practices)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@inner ridge kapa.ai is still learning and improving, please let us know how it did by reacting below
Good answer (I don't know that it's true, but seems reasonable.)
Source provided from forum, #1184382524038852650 message seems poorly connected to this question though it was better connected to the initial question.