#Creating Prompts and Completions for Fine-Tuning

29 messages · Page 1 of 1 (latest)

severe pewter
#

Say I'm trying to create a chatbot for real estate projects using a single model, eg davinci. I have a set of prompts and completions for projects A and B each. However, some prompt and completion pairs are not labelled throughout the text, meaning that the model wouldn't know which project does this prompt-completion pair belong to.

Under this context, a roundabout method that I have thought of, is to add keywords at the end of the prompts, eg: I will add (Project A) or (Project B) at the end of each prompt. By doing this, I hope that the model is aware that the prompt-completion pair is tied to a particular project. Is this common practice for training NLP models?

wet cloud
#

Why not fine tune two separate models and then select the model to use based on which project the client is chatting about?

severe pewter
#

We were worried about having limited data. But will give it a go too. Thanks for the suggestion.

safe moss
severe pewter
#

@safe moss it didn't work on our end, even for one project (eg project A). We were trying to get the davinci model to answer five specific questions relating to project A after fine tuning, but the answers that it gave were non-satisfactory.

What we tried: Paraphrasing one question in 20 different ways and sending it through the model.

safe moss
#

Ah gotcha. Sorry that it didn't work out for you. It seems like it takes a lot of trial and error to program AI

timid reef
#

Use embeddings, best would probably be some classifiers first so you can get the proper information for the given project.

buoyant mango
#

at least Q/A like that

#

many other functions also

severe pewter
timid reef
# severe pewter Hi <@434520004336025603> sorry if this question sounds elementary, but even if w...

With classifiers i mean you need to be sure for which project you need Embeddings. So the first part in a conversation would be something like:

Prompt: Do we know which Project the Question is about?
Question: "<some Project-related Question>"
Completion: Project-0103-Housing

Then in the next step you search for Embeddings against a Vector DB with the actual Question, but limit yourself to only Query against the Project-specific ones.

#

You don't really need finetuning for the classifiers, i mainly use few-shot examples and the GPT 3.5 Family and it works fine.

#

Even with turbo.

#

Especially if you only have a few dozen questions per Project it makes sense to have something more general in the first step.

buoyant mango
#

I can do specific document search just fine without them

timid reef
#

Skip the classifier step at the start like @buoyant mango said and think about this later when you understood Embeddings.

buoyant mango
severe pewter
#

Ahh, I think I understand what you mean:

Preparation: Convert prompts and completions for projects A and B into two separate word embedding databases.

Step 1: Classification of prompt into Project A or B. Let's say we classify a prompt into Project A.

Step 2: Convert prompt into embedding and compare the embedding to the database of Project A via a distance based measure.

Step 3: Use GPT3.5 to answer my prompt based on the most similar embedding selected from the database of Project A.

Please do correct me if I'm wrong.

timid reef
#

Yeah in a nutshell thats correct. You could even skip "Prompts" on the Preperation Part and only create embeddings for your real Project Documentation.

#

And search against those, Embeddings basically do nothing more than search for similiarity.

#

Or that's what you gonna do with them in a QA Case.

buoyant mango
severe pewter
#

Noted. Thank you for the suggestions everyone. Will try it out this weekend after my 9 to 5 job. Really appreciate the help! 🙏

bright stag
#

Can you please shed some light on how your prompt was structured?