#Maintaining persistence in chats

2 messages · Page 1 of 1 (latest)

thin bear
#

I am using the embeddings API and chat API to get responses for own knowledgebase comprising of various documents. For embeddings, I break down my documents into smaller chunks and use token sizes corresponding to roughly 3000 words. Then for every question prompt, I do a cosine similarity and find the closest embedding and provide the relevant text chunk as the context for the question. The problem I am facing is that at times I would like to provide more context (text from multiple documents). This context does not fit into the token limit for chat API. Is there some way of providing the relevant context in multiple prompts and then asking questions on it? I see there is no persistence or session available. I do see a chat ID being returned and asked a prompt referring to the chat IDs returned in previous prompts that I provided as context. But it could not recognize those prompts. Other option for me is to break down the text into even smaller chunks of say 300 words and then calculate the embeddings. This is a bit of a bummer because of the sleep that I will need to incorporate after 20 RPM limit is crossed. Any suggestions on how to solve this problem?

wheat wind