#Pricing of GPT3.5T fine tuning
1 messages · Page 1 of 1 (latest)
Hi @brittle ether. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
In the context of OpenAI's pricing, Training, Input usage, and Output usage refer to different aspects of using the AI models:
-
Training: This is the cost associated with fine-tuning a model on your specific data. Fine-tuning allows you to customize a model for your application, providing higher quality results, the ability to train on more examples than can fit in a prompt, token savings due to shorter prompts, and lower latency requests. The cost for this process is separate and in addition to the costs for input and output usage. -
Input usage: This refers to the cost of the tokens that you send to the model as input. A token can be as short as one character or as long as one word (e.g., "a" or "apple"). The number of tokens in your input affects the cost, the time it takes, and whether the call works at all based on the model’s maximum limit. -
Output usage: This refers to the cost of the tokens that the model generates as output. The total number of tokens in an API call affects the cost.
The number of tokens necessary to get reliable structured output from the model can vary greatly depending on the complexity of the task and the specific requirements of your application. However, it's important to note that for a longer chat, you will have to pay for the entire history + the new prompt in terms of input tokens.
You can estimate the token usage and cost using the following formula and code snippet:
# estimate inference cost assuming gpt-3.5-turbo (4K context)
i_tokens = result["model_response"].usage.prompt_tokens
o_tokens = result["model_response"].usage.completion_tokens
i_cost = (i_tokens / 1000) * 0.0015
o_cost = (o_tokens / 1000) * 0.002
print(f"""Token Usage
Prompt: {i_tokens} tokens
Completion: {o_tokens} tokens
Cost estimation: ${round(i_cost + o_cost, 5)}""")
For more detailed information, you can refer to the OpenAI Pricing Guide (https://openai.com/pricing) and the Fine-tuning Guide (https://platform.openai.com/docs/guides/fine-tuning).
Sources:
- Community Post: So how much will openai chat gpt 3.5 cost for me? (https://community.openai.com/t/681526)
- Community Post: Fine tuning issue in playground (https://community.openai.com/t/498074)
- Docs > Guides > Fine-tuning (https://platform.openai.com/docs/guides/fine-tuning)
- Examples > Named Entity Recognition to Enrich Text > 5. Let's Enrich a Text with Wikipedia links > 5.3 Token Usage (https://cookbook.openai.com/examples/named_entity_recognition_to_enrich_text#53-token-usage)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@brittle ether kapa.ai is still learning and improving, please let us know how it did by reacting below
In that code snippet, what is result? I replaced i_tokens and o_tokens like this:
i_tokens = len("Fully factorise {5q^2 -17q -12}")
o_tokens = len("a) (5q + 3)(q - 4) b)")
i_cost = (i_tokens / 1000) * 0.0015
o_cost = (o_tokens / 1000) * 0.002
print(f"""Token Usage
Prompt: {i_tokens} tokens
Completion: {o_tokens} tokens
Cost estimation: ${round(i_cost + o_cost, 5)}""")```
The only problem I am facing is obviously that is a very low price ($9e-05). This is obviously not the correct question as that is for a single question in the fine tuning set @feral hazel
Let me check the knowledge base for you. One moment! :books:
The result in the code snippet refers to the response you get from the OpenAI API after making a request. The result["model_response"].usage.prompt_tokens and result["model_response"].usage.completion_tokens are fields in the API response that give you the number of tokens used in the prompt and the completion respectively.
In your code, you're calculating the cost based on the length of the strings, which is not the same as the number of tokens. A token in GPT-3.5-turbo can be as short as one character or as long as one word. For example, the string "Fully factorise {5q^2 -17q -12}" is 5 tokens long, not 30. You can use OpenAI's tokenizer tool (https://platform.openai.com/tokenizer) to count the number of tokens in a string.
Moreover, the cost you're calculating is for a single inference (i.e., a single question-answer pair), not for fine-tuning. Fine-tuning involves training the model on a large dataset over multiple epochs, which can involve billions of tokens and hence be much more expensive.
To estimate the cost of fine-tuning, you can use the following formula:
base cost per 1k tokens * number of tokens in the input file * number of epochs trained
For example, if you have a training file with 100,000 tokens and you train over 3 epochs, the expected cost would be ~$2.40 USD.
Sources:
- Docs > Guides > Fine-tuning > Preparing your dataset > Estimate costs (https://platform.openai.com/docs/guides/fine-tuning)
- Examples > Data preparation and analysis for chat model fine-tuning > Cost Estimation (https://cookbook.openai.com/examples/chat_finetuning_data_prep#cost-estimation)
- Docs > Guides > Production best practices > Managing costs > Text generation (https://platform.openai.com/docs/guides/production-best-practices)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@brittle ether kapa.ai is still learning and improving, please let us know how it did by reacting below
Can you generate a full snippet of code to show input and output maybe using # --- snip --- if you dont' want to populate the entire input and output dataset.
And also can you use tiktoken if required @feral hazel
Let me check the knowledge base for you. One moment! :books:
Sure, here's an example of how you might use tiktoken to count the number of tokens in your input and output strings, and then calculate the cost of a single inference:
from tiktoken import Tokenizer
from tiktoken.models import Model
# Initialize the tokenizer and the GPT model
tokenizer = Tokenizer()
model = Model()
# Your input and output strings
input_str = "Fully factorise {{5q^2 -17q -12}}"
output_str = "a) (5q + 3)(q - 4) b)"
# Tokenize the strings and count the tokens
input_tokens = len(list(tokenizer.tokenize(input_str)))
output_tokens = len(list(tokenizer.tokenize(output_str)))
# Calculate the cost
input_cost = (input_tokens / 1000) * 0.0015
output_cost = (output_tokens / 1000) * 0.002
print(f"""Token Usage
Prompt: {input_tokens} tokens
Completion: {output_tokens} tokens
Cost estimation: ${round(input_cost + output_cost, 5)}""")
Please note that this is the cost for a single inference, not for fine-tuning. The cost of fine-tuning would be much higher and depends on the size of your training dataset and the number of training epochs.
Sources:
- Examples > How to count tokens with Tiktoken > 1. Import
tiktoken(https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken#1-import-tiktoken) - Examples > How to count tokens with Tiktoken > 6. Counting tokens for chat completions API calls Part 3 (https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken#6-counting-tokens-for-chat-completions-api-calls)
- Docs > Guides > Embeddings > Frequently asked questions > How can I tell how many tokens a string has before I embed it? (https://platform.openai.com/docs/guides/embeddings)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@brittle ether kapa.ai is still learning and improving, please let us know how it did by reacting below
Is this what you mean?
import tiktoken
enc = tiktoken.encoding_for_model("gpt-3.5")
# Your input and output strings
input_str = "Fully factorise {{5q^2 -17q -12}}"
output_str = "a) (5q + 3)(q - 4) b)"
# Tokenize the strings and count the tokens
input_tokens = len(list(enc.encode(input_str)))
output_tokens = len(list(enc.encode(output_str)))
# Calculate the cost
input_cost = (input_tokens / 1000) * 0.0015
output_cost = (output_tokens / 1000) * 0.002
print(f"""Token Usage
Prompt: {input_tokens} tokens
Completion: {output_tokens} tokens
Cost estimation: ${round(input_cost + output_cost, 5)}""")
And also this is stil $5e-05. Is this really enough tokens for the model to train on? Surely not @feral hazel
Let me check the knowledge base for you. One moment! :books:
Your code snippet is correct for counting tokens in a string using tiktoken. However, the cost you're calculating is for a single inference (i.e., a single question-answer pair), not for fine-tuning.
Fine-tuning involves training the model on a large dataset over multiple epochs, which can involve billions of tokens and hence be much more expensive. The cost of fine-tuning is not calculated on a per-inference basis, but rather on the total number of tokens in your training dataset and the number of epochs you train for.
Here's an example of how you might calculate the cost of fine-tuning:
# Assume you have a training dataset with 1 million tokens
# and you're planning to train for 3 epochs
num_tokens = 1_000_000
num_epochs = 3
# The cost per token for fine-tuning is $0.06 per 1k tokens
cost_per_token = 0.06 / 1000
# The total cost is the number of tokens times the cost per token times the number of epochs
total_cost = num_tokens * cost_per_token * num_epochs
print(f"Total cost for fine-tuning: ${total_cost}")
Please note that this is a simplified example and the actual cost may vary depending on various factors. For more accurate estimates, you should refer to the OpenAI Pricing Guide (https://openai.com/pricing) and the Fine-tuning Guide (https://platform.openai.com/docs/guides/fine-tuning).
Sources:
- Docs > Guides > Fine-tuning > Preparing your dataset > Estimate costs (https://platform.openai.com/docs/guides/fine-tuning)
- Examples > Data preparation and analysis for chat model fine-tuning > Cost Estimation (https://cookbook.openai.com/examples/chat_finetuning_data_prep#cost-estimation)
- Docs > Guides > Production best practices > Managing costs > Text generation (https://platform.openai.com/docs/guides/production-best-practices)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@brittle ether kapa.ai is still learning and improving, please let us know how it did by reacting below
@feral hazel When using fine tuning. Can I setup something in my service from OpenAI that acts as a server for fine tuning and run the fine tuning locally to avoid the costs? AKA self-hosted fine tuning for GPT. Or does OpenAI not provide this?
Let me check the knowledge base for you. One moment! :books:
OpenAI currently does not support self-hosted fine-tuning. All fine-tuning of OpenAI models must be done through the OpenAI API, which ensures that the process is carried out in a controlled environment that meets OpenAI's standards for safety and ethics. This also allows OpenAI to manage the computational resources required for fine-tuning, which can be substantial.
You can, however, run the fine-tuned models locally for inference (i.e., generating predictions) after the fine-tuning process is complete. This involves making API calls to the fine-tuned model, which can be done from your local machine or server.
For more information, please refer to the OpenAI Fine-Tuning Guide (https://platform.openai.com/docs/guides/fine-tuning).
Sources:
- Docs > Guides > Fine-tuning (https://platform.openai.com/docs/guides/fine-tuning)
- Community Post: Creating paywall for premium version of custom GPT > Answers Part 4 (https://community.openai.com/t/526788)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@brittle ether kapa.ai is still learning and improving, please let us know how it did by reacting below
You can, however, run the fine-tuned models locally for inference (i.e., generating predictions) after the fine-tuning process is complete. This involves making API calls to the fine-tuned model, which can be done from your local machine or server.
So you can actually run the model locally for free after it is fine tuned? Or you can request the API to do it on the OpenAI servers? @feral hazel
Let me check the knowledge base for you. One moment! :books:
I apologize for any confusion. When I mentioned running the fine-tuned models locally, I meant that you can make API calls to the fine-tuned model from your local machine or server. However, the actual model inference (i.e., generating predictions) is still performed on OpenAI's servers, not on your local machine.
Each API call to the fine-tuned model incurs a cost based on the number of tokens in the input and output. This cost applies regardless of whether the API call is made from a local machine, a server, or a cloud environment.
OpenAI does not currently support running their models entirely locally on your own hardware for free. All interactions with the models, including both fine-tuning and inference, must go through the OpenAI API.
For more information, please refer to the OpenAI Pricing Guide (https://openai.com/pricing) and the Fine-tuning Guide (https://platform.openai.com/docs/guides/fine-tuning).
Sources:
- Docs > Guides > Fine-tuning (https://platform.openai.com/docs/guides/fine-tuning)
- Community Post: Creating paywall for premium version of custom GPT > Answers Part 4 (https://community.openai.com/t/526788)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@brittle ether kapa.ai is still learning and improving, please let us know how it did by reacting below
@feral hazel When using a fine tuned model, if the model is based off 3.5, does the GPT 3.5 pricing apply or different pricing?
Let me check the knowledge base for you. One moment! :books:
When using a fine-tuned model based on GPT-3.5, the pricing for the fine-tuned model usage is different from the base GPT-3.5 model. The cost of using a fine-tuned model includes the cost of the tokens used in the API calls (both input and output tokens), similar to the base model, but the rate per token may be different.
The exact pricing details for fine-tuned models can be found on the OpenAI Pricing page (https://openai.com/pricing). It's important to note that the cost of fine-tuning the model (i.e., the training process) is separate and in addition to the cost of using the fine-tuned model.
Sources:
- Docs > Guides > Fine-tuning (https://platform.openai.com/docs/guides/fine-tuning)
- Community Post: Any idea when openai will reduce Custom Fine Tune model api inference cost? > Official Answer (https://community.openai.com/t/634285)
- Community Post: What is the price of gpt-3.5-turbo? > Official Answer (https://community.openai.com/t/554656)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@brittle ether kapa.ai is still learning and improving, please let us know how it did by reacting below
@hardy notch So you're trying to tell me the cost of using a fine tuned GPT 3.5 tripples over using the base GPT3.5!? why? there shouldn't really be any extra computational cost after training for OpenAI
Oh apologies... that's not Kapa... no wonder it didn't respond
@feral hazel So you're trying to tell me the cost of using a fine tuned GPT 3.5 tripples over using the base GPT3.5!? why? there shouldn't really be any extra computational cost after training for OpenAI
Let me check the knowledge base for you. One moment! :books:
The pricing for using a fine-tuned model is set by OpenAI and may be different from the base model. While it might seem that there should be no extra computational cost after training, there are several factors that could contribute to the higher cost of using a fine-tuned model:
-
Resource Allocation: Fine-tuned models are stored separately and may require dedicated resources to be available for inference. This could lead to higher operational costs.
-
Model Management: Fine-tuned models need to be managed separately, including versioning, monitoring, and maintenance, which could also contribute to the cost.
-
Value Proposition: Fine-tuned models are designed to provide higher quality results for specific tasks compared to the base model. This added value could be reflected in the pricing.
It's also worth noting that the cost of fine-tuning (i.e., the training process) is separate and in addition to the cost of using the fine-tuned model.
For the exact pricing details, you should refer to the OpenAI Pricing page (https://openai.com/pricing).
Sources:
- Docs > Guides > Fine-tuning (https://platform.openai.com/docs/guides/fine-tuning)
- Community Post: Any idea when openai will reduce Custom Fine Tune model api inference cost? > Official Answer (https://community.openai.com/t/634285)
- Community Post: What is the price of gpt-3.5-turbo? > Official Answer (https://community.openai.com/t/554656)
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@brittle ether kapa.ai is still learning and improving, please let us know how it did by reacting below
I still don't really see why it would have quite such a high cost 3x still seems extreme and makes me want to self host AI... unfortunately I cannot due to the math capabilities of GPT being required here