I am using the embeddings API and chat API to get responses for own knowledgebase comprising of various documents. For embeddings, I break down my documents into smaller chunks and use token sizes corresponding to roughly 3000 words. Then for every question prompt, I do a cosine similarity and find the closest embedding and provide the relevant text chunk as the context for the question. The problem I am facing is that at times I would like to provide more context (text from multiple documents). This context does not fit into the token limit for chat API. Is there some way of providing the relevant context in multiple prompts and then asking questions on it? I see there is no persistence or session available. I do see a chat ID being returned and asked a prompt referring to the chat IDs returned in previous prompts that I provided as context. But it could not recognize those prompts. Other option for me is to break down the text into even smaller chunks of say 300 words and then calculate the embeddings. This is a bit of a bummer because of the sleep that I will need to incorporate after 20 RPM limit is crossed. Any suggestions on how to solve this problem?
#Maintaining persistence in chats
2 messages · Page 1 of 1 (latest)
The chat IDs are purely metadata that openai sends us back, we have no way to interact with them. You'll likely have to reduce your chunk sizes. If you add a payment method to the account you will have a higher rate limit.