#Maximum examples fine-tuning?

1 messages · Page 1 of 1 (latest)

zealous hemlock
#

I am trying to fine-tune a gpt 3.5 turbo-1106 model to allow GPT to properly classify descriptions of projects into certain categories. I possess about 300,000 rows in training data, but it seems that the OpenAI environment will only allow me to upload about 9,700 rows. Anything above that will provide an error. Is there a maximum limit of examples? I also checked the maximum token size of each training prompt, which does not exceed 1,000 per prompt and is thus well below the token limit. The file size also never exceeds 1GB.

civic sinewBOT
#

Hi @zealous hemlock. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

The OpenAI fine-tuning guide does not specify a maximum limit for the number of examples you can use for fine-tuning. However, it does recommend starting with 50 well-crafted demonstrations and seeing if the model shows signs of improvement after fine-tuning. The guide also mentions that clear improvements are typically seen from fine-tuning on 50 to 100 training examples with gpt-3.5-turbo, but the right number varies greatly based on the exact use case source (https://platform.openai.com/docs/guides/fine-tuning).

As for the token limits, they depend on the model you select. For gpt-3.5-turbo-1106, the maximum context length is 16,385, so each training example is also limited to 16,385 tokens source (https://platform.openai.com/docs/guides/fine-tuning).

If you're encountering an error when trying to upload more than 9,700 rows of training data, it might be due to other factors not directly related to the number of examples or token limits. It could be related to the size of the file you're trying to upload, or there might be some other restrictions or limitations in the OpenAI environment that are not explicitly mentioned in the documentation.

I would recommend reaching out to OpenAI support for further assistance with this issue. They might be able to provide more specific guidance based on the exact error message you're receiving.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: