Large Unstructured Dataset | OpenAI | Page 1

oak mountain Dec 30, 2022, 5:41 AM

#

Making a thread so this conversation doesn't clog up the chat 🙂

#

Firstly, I am not an expert in any of this by any measure. Just another person looking to leverage this to make certain tasks easier for myself and others or enable use cases that weren't possible before.

#

But, if you are going to use GPT-3, it sounds like you need a fine tune model.

With a fine tune model you can provide it lots of data and teach it what good responses are to specific prompts. The data you feed a fine tune is in the format:
{"prompt": "[insert prompt here]"}, {"completion":"[insert completion here]"}

#

Large Dataset

#

Large Unstructured Dataset

#

I'm not sure what the best method would be to provide it all of the information it needs to adequately address your need.

#

With that said, based one what you've shared there is a youtube video that may be instructional for you. I personally haven't watched it so cannot vouch for it but it has come up on my page a lot:
3. OpenAI API Python - Earnings Call Summarization by "Part Time Larry"

distant sigil Dec 30, 2022, 2:36 PM

#

oak mountain With that said, based one what you've shared there is a youtube video that may b...

Seems like for a large data set you need to use embedding

#

Fine tuning is mostly used to adjust the way questions are answered

oak mountain Dec 30, 2022, 3:22 PM

#

As I understand it fine tuning is there to take the vast unstructured language understanding that an LLM has and focus it in on a specific set of data.

oak mountain Dec 30, 2022, 3:24 PM

#

distant sigil Seems like for a large data set you need to use embedding

I'm not sure what you mean by "you need to use embedding". I'm still new to this technology and don't understand what you've said, will you please explain what you mean?

distant sigil Dec 30, 2022, 5:30 PM

#

oak mountain I'm not sure what you mean by "you need to use embedding". I'm still new to this...

I’m still learning about it myself but its a feature called embedding

#

Trying to figure it out myself rn

#Large Unstructured Dataset