Hi everyone; so my team is working to train the API model to read long government contracts and spit back out necessary/significant information so we would be able to read such contracts efficiently. When using the data-preparation tool on API, I found that the model wants us to limit our entries to <2000 tokens -- has anyone else conducted projects in which they had to work around a character limit? Also, if so, are there any common errors/edge-cases that you've encountered with the data-preparation tool that I could run into when training this data? Thank you
#API Fine-tuning Word Character Limit
7 messages · Page 1 of 1 (latest)
This is a fairly regular problem when trying to use GPT3/3.5 for analysis/summarization of large documents like contracts. You cannot work around the token limit but can work on a chunking the document appropriately and recursively extracting/abstracting each chunk. This is an evolving area and we'd love to understand as well.
How can I API Fine-tuning with chatGPT-turbo0301?
There is any JS code that do that?
That model is not available for fine tuning
So which one?
I'm also encountering the same issue. I tried chunking my text into 1500 words per chunk, then send it to ChatGPT.
I then store the response and then adjust the prompt into something like,
system: "Summarize someething. This was your previous answer. {previousAnswer}"
user: "Continue summarizing for the next text: {nextText}"
But the result is quite bad.