Response / Analysis LLM Discussion | OpenAI | Page 1

eternal turret · 2024-03-26T16:34:52.423Z

Thanks A couple follow up questions if you don't mind so I can better help you. - Which model are you using? - What temperature? - If you don't mind sharing an example of a prompt with me that would be awesome

eternal turret Mar 26, 2024, 4:34 PM

#

Thanks @limpid sky

A couple follow up questions if you don't mind so I can better help you.

Which model are you using?
What temperature?
If you don't mind sharing an example of a prompt with me that would be awesome

#

If you want to keep it private though I understand and I can just point you in the direction of some easy fixes.

#

I'm pulling up an example of my own attempt at this with my bot in Playground real quick.

limpid sky Mar 26, 2024, 4:39 PM

#

I'm using 3.5 currently and I understand that switching to 4 will most definitely improve the quality, but for now I'm trying to make do with that.
Temperature is 0.3, the prompt I used was "Every time you respond, add a "\t" after you're done speaking. Then analyze the tone of USER's last message and append it to your response. Answer "YES" if you understand."

#

It works flawlessly at first, but at some point it seems to occasionally start forgetting stuff or incorrectly presenting it.

#

One thought I had was to remind the model constantly to follow this instruction (something to do with context length? Still learning the terminology) but that seems inefficient.

#

In general, if I understand correctly, you're supposed to add a bunch of context to the user's message under the hood, correct? If the user inputs "Hi", I would append/prepend a bunch of instructions as "system". Or is it fine to just keep it in history?

eternal turret Mar 26, 2024, 4:45 PM

#

Getting back to this now, was just answering a quick question about embeddings.

limpid sky Mar 26, 2024, 4:45 PM

#

No rush

eternal turret Mar 26, 2024, 4:50 PM

#

A few quick remarks from what I can see initially:

The prompts get easily confused if you have multiple "answers". For example if I take out Answer YES if you understand then suddenly the model starts behaving.
The way you ask for a special character also affects the model. Instead of asking it to put it after the response, try putting it before the analysis.

Here's what I get on 0.3 temperature with GPT 3.5 Turbo using this updated prompt

Every time you respond, speak directly to the user. Then analyze the tone of USER's last message and append it to your response.  Add a "\t" before analyzing the tone.

limpid sky Mar 26, 2024, 4:52 PM

#

eternal turret A few quick remarks from what I can see initially: - The prompts get easily con...

Yep, looks very similar to what I get. The prompt is also a lot more understandable than what I had lol. How do I deal with deterioration over time though?

eternal turret Mar 26, 2024, 4:55 PM

#

limpid sky Yep, looks very similar to what I get. The prompt is also a lot more understanda...

That's a really good question. If you can come up with a good answer then you'd have the admiration of the entire OpenAI community. 😁

My method to prevent it was to break this style of prompt up into separate prompts.

For example, my bot asks the following questions when it receives a new message:

How can I help the user and what is their tone?
Do I have enough information to respond or do I need more context?
Are my memories enough context or should I ask the user for more information?
Finally, I append all of the answers above to a single prompt and ask it to use all of the information provided to best respond to the user.

#

The reason for breaking it up to prevent deterioration is that you may suddenly get the wrong behavior from a single part of that "chain", and it's easier to fix that one simple part than to fix the whole prompt. I've spent years chasing "Single-Attempt / Single-Shot" prompts around and they always deteriorate over time unfortunately.

#

By keeping each prompt itself to one, or at most two focused questions, you can keep that reliability high.

limpid sky Mar 26, 2024, 4:58 PM

#

So every time I send a call, I'm should be appending several prompts to the history anyway?

#

Or do you mean sending several calls, waiting for each of them, then appending to a single reponse?

eternal turret Mar 26, 2024, 4:59 PM

#

If you have multiple passes of analysis you want to do then yes definitely, make a series of prompts with no history and then combine them together.

Even a simple two-shot prompt in your case would increase reliability, e.g. you could use GPT-3.5 to get the tone of the user. Then pass its response to GPT-4 and make a "human sounding response"

eternal turret Mar 26, 2024, 4:59 PM

#

limpid sky Or do you mean sending several calls, waiting for each of them, then appending t...

Yep, that's what I mean

limpid sky Mar 26, 2024, 4:59 PM

#

eternal turret Yep, that's what I mean

Wouldn't that introduce noticeable latency?

eternal turret Mar 26, 2024, 5:00 PM

#

limpid sky Wouldn't that introduce noticeable latency?

Unfortunately yes, it takes response time up to about 5 seconds.

limpid sky Mar 26, 2024, 5:00 PM

#

eternal turret Unfortunately yes, it takes response time up to about 5 seconds.

Oof

eternal turret Mar 26, 2024, 5:00 PM

#

I try to run as many questions in parallel as possible, and I try to make requests in-between messages to "think ahead" to save time, but yeah there's no getting around the speed issue.

You'd need to run a local GPT model if you want very fast responses.

limpid sky Mar 26, 2024, 5:01 PM

#

So getting back to my initial idea, would fine-tuning help to remedy this issue?

#

Or is it better suited for cofining the model to a particular "style", not hard limitations?

eternal turret Mar 26, 2024, 5:03 PM

#

I would follow OpenAI's advice here right from their docs:

Fine-tuning OpenAI text generation models can make them better for specific applications, but it requires a careful investment of time and effort. We recommend first attempting to get good results with prompt engineering, prompt chaining (breaking complex tasks into multiple prompts), and function calling, with the key reasons being:

There are many tasks at which our models may not initially appear to perform well, but results can be improved with the right prompts - thus fine-tuning may not be necessary

In cases where fine-tuning is still necessary, initial prompt engineering work is not wasted - we typically see best results when using a good prompt in the fine-tuning data (or combining prompt chaining / tool use with fine-tuning)

#

Your prompt seems safe enough that you should be able to get far with it just by tweaking it. If it's just analysis + response then GPT is well suited to it in my opinion.

#

Ultimately though the work won't be wasted. If prompt tweaking can't get you the result you want then you'll need the prompt for the fine-tuning anyway.

limpid sky Mar 26, 2024, 5:07 PM

#

eternal turret Ultimately though the work won't be wasted. If prompt tweaking can't get you the...

I see. I gotta admit that I might use this as an excuse to learn fine-tuning anyway. Haven't done it yet so I think I'll try the prompt chaining first, record the performance, then try to come up with a good tuning set and see how it compares after.

eternal turret Mar 26, 2024, 5:10 PM

#

Never hurts to try it out if you're feeling up to fine tuning. :)

#

Good luck with the chaining. Feel free to poke me or ask around the channels any time if you need help with it.

limpid sky Mar 26, 2024, 5:11 PM

#

eternal turret Good luck with the chaining. Feel free to poke me or ask around the channels any...

Thanks for your time. Something tells me I'm gonna be asking more question real soon 😅

eternal turret Mar 26, 2024, 5:12 PM

#

limpid sky Thanks for your time. Something tells me I'm gonna be asking more question real ...

Gladly! That's what I'm here for.
I also tend to learn a lot from the questions people ask so I appreciate it!

#Response / Analysis LLM Discussion