#Best approach to create a rag for all the private chats?

3 messages · Page 1 of 1 (latest)

indigo rose
#

Given the unstructured nature of conversational data, I'm unsure about the most effective embedding strategy to use. The options I'm considering are:

Sentence-Level Embeddings: To capture details from each sentence. The comparison would naturally be done on that level when compared to other sentence embeddings. This also implies that the embedding may miss out on broader contextual information found in a paragraph or document.

Conversation-Level Embeddings: To understand the broader context of entire conversations, that would be adding the sentences to make it like traditional document.

Summarization Before Embedding: Summarizing conversations before embedding to balance detail and context.

Could you share your recommendations or experiences regarding the best embedding approach for this scenario? Any specific techniques or methodologies that have proven effective in similar applications would be greatly appreciated.

hearty pier
#

I’ve also worked on a project involving search on conversational data and this was one of the core problems I encountered and spent some time thinking about.

  1. You could just index both sentence and “conversation” level embeddings in separate vector dbs and route the query to the appropriate db based on likely level of context needed.

  2. You could do only sentence level embeddings, but dynamically choose the number of embeddings retrieved based again on the complexity of the query and level of context needed.

indigo rose