#Fine-Tuning Advice
1 messages · Page 1 of 1 (latest)
Hi @rose axle. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Hmm, I don't know enough to give you a confident answer yet. The provided extracts do not contain specific information about how the length of question and answer pairs impacts the learning efficacy and output quality of a fine-tuned AI model. However, it's important to note that fine-tuning is used to refine the way a model responds, including setting the style, tone, format, or other qualitative aspects, and improving reliability at producing a desired output source (https://community.openai.com/t/408427).
For more detailed information on fine-tuning, you may want to refer to the OpenAI fine-tuning guide (https://platform.openai.com/docs/guides/fine-tuning).
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@rose axle kapa.ai is still learning and improving, please let us know how it did by reacting below
Your finetuning examples should generally represent the kinds of threads you ideally expect. If you have a lot of slack though, I've noticed that shorter examples result in more of an effect than longer ones, but the effect might be too exaggerated at times. If you can help it, having detailed examples and lots of them yield best results imo, but if you expect your threads to only be 3-5 messages long then don't worry about it. It's very use-case dependent
I've had good success with shorter examples as well, and wondered whether I just need more examples to get good results with the longer interactions. I'll just have to experiment!
I once needed to finetune for generating around two sentences at a time, so my finetuning examples had around 5 in bulk each. I noticed that this approach reduced overfitting since with 1 per example would result in it just repeating the training data
It'll take a larger dataset for it to learn if your examples are longer but that can be desirable
How many examples would you say is a minimum to expect some kind of results? The documents say 50 to 100, but if 500 is better then I'll use 500.
You should use as many high-quality examples that fit your budget but I've only ever trained with 50 examples at the very most and have had decent finetunes. iirc the docs suggest starting with a smaller dataset like 60 before scaling up to 500+
if your dataset is just questions and answers it doesn't sound like you'll need a ton of examples especially if it's something the model already knows about