#how do I have my API remember past parts of conversation?
56 messages · Page 1 of 1 (latest)
You can append the conversation history to the prompt that you’re sending. For example:
Human: hey!
GPT; Hey there, how can I help?
Human: my name is Kaveen!
GPT:
that would be an example of 1 prompt sent to the model, after the human says “my name is Kaveen!”
Is this counted in the max_tokens associated with the request? Will longer conversations eventually be capped?
yes
So if I was using this method to create a chatbot, would it only be able to talk for a while before being capped?
well, if you give it say the last 10 messages and have that update every time there is a new one you can talk to it for ever. but if you just add everything to the log, yes, it will cap out.
I'm trying nlp with database queries to help it "remember". If it detects a certain topic or subject, it should query the database, gather the relevant information, and form a new prompt that will give background to the current text. That may work for you.
I do similarly, you can use embeddings to query the database
Yes tokens would stack and eventually surpass token limit , while slightly compounding cost each request submitted this way.
I'm finding good success cutting the conversation to the last 3-5 messages, summarizing it all, and then of course using Embeddings to retrieve answers
Not the best but I'm able to stay pretty consistent with $0.01 per request
I'm still having issues with it understanding context and any exact references to a previous part of the conversation. However I'm also having good success storing the full transcript and falling back to a simple function
I do a similar approach, when the conversation history hits a certain limit it’s summarized and the chat history is started again with the summary at the beginning
And it keeps going, recursively summarizing
And I have another approach that embeds the conversation pairs and finds the X most relevant (time-ordered) pairs and constructs a prompt with them before sending to the API
@trim helm why time ordered? I'm doing the exact same besides that part. I'm currently trying to self-rate by contextual accuracy. Been difficult
Any suggestions?
Oh, is it for when it's lost the information from the conversation?
I needed it to be time ordered to have a cohesive conversation history, I embed each individual message so the user prompt and the GPT reply would be 2 embeddinfs, then when retrieving the embeddings I don’t want them to be out of order
Basically I have a sliding window of X for embeddings and a static window of Y
The static window of Y always contains the Y most recent messages in the conversation, in order
Ahhh I see. Which method are you liking more?
The sliding window of X is is to find X relevant prompts before the Y static window
So the stuff at the beginning of the conversation history is from embeddings, the found possibly relevant things, then the last bits of the history are always the last Y messages, to maintain cohesion
The summarizations method is more stable tbh, but I’m working on fine tuning the embeddings to be better
Interesting. I'm using Embeddings simply as a bank of previous conversation pairs, and then uaing DaVinci to rate them (I just pick the top 3 most related) by contextual accuracy.
Previous conversation pairs with everybody
That's an interesting approach. I would like to try it
I'm hoping that with enough time at least 40-50% of conversation pairs can be cached and save money. We'll see
Yeah, the main issue I found with your approach is that it loses conversation cohesion, bits of conversation will be out of order and it’ll make it hard for GPT to answer sometimes
Even ChatGPT has a hard time with rating. Or maybe my prompts are malformed. Been playing at it all day
@trim helm interesting you say that, that's my current issue. It seems to sometimes ignore the latest message and repeat another older answer even with the penalties set high
That's what I'm hoping the context rating will solve. I hope
Let me know the effectiveness 🙂
You can check out my project if you wanna see how I use embeddings, it’s called GPT3Discord, @bleak spade also has a project for conversation with embeddings out that’s much better than mine and more reliable and stable and consistent
I will. Thanks for showing me. It feels so nice knowing that I'm on ... Some sort of shared path 😂@trim helm
I'm hoping to place my project up as well. A silly graphical sprite that acts and responds to chat. A peasant with a history
And predisposition to blame everything on magic
In order for it to remember the conversation, should I send it all the messages that the AI and I have had every time it wants a new answer?
Yeah you have to create an artificial memory using prompt injection and/or embeddings
and why only 4000 tokens? there is still no way to increase it, is there?
You could try sending a message to OpenAI to increase your limit. Although you may want to consider optimizing your prompt instead.
I have a similar challenge but I want to be able to recall granular detail over a larger corpus. I am willing to take any workarounds necessary, i.e. creating embeddings or fine-tuning models but I'm relatively new at this and can't seem to find a way for my solution to look back past 4097 tokens without trying to reinvent the wheel in my code.
May i know on how to use embeddings to query database?
sure :)
you need to create an embedding vector for each of the things you want to search for
for example, you can create embeddings for blog post content, or messages in a chat app, each message or blog post will have one or more embeddings
(if a blog post is long, you need to create embeddings per post section instead of the entire post, because of token limits)
Then, you create an embedding for an input query, and then do a similarity check (search for "cosine similarity algorithm" for hand crafted, or use weaviate or pinecone, or faiss or something like that) of the input versus each item in the database.
you sort the results according to the cosine similarity, for example "1" means you managed to match the content exactly, a lower number means it's not that relevant to your query.
You take top x of these results, and present that to the user.
Do you have sources where you learn it?
Because I saw the documentation but it doesn’t give me much help i needed
I certainly want to store past conversations for my chatbot
Oh okay, thank you, do you mind if I dm you if i have other questions?
You can ask them in here, may be easier, because I'm typically very busy
Or if you want privacy sure can DM too
@proven bane
Ok thank u