#Metadata vs Adding additional context in Document while setting up RAG pipeline
1 messages · Page 1 of 1 (latest)
Hi @wintry kite. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
The decision to add context about the core data directly in a CSV file or as metadata depends on the specific use case and the nature of the data.
In the context of setting up a Retrieval Augmented Generation (RAG) pipeline, the primary goal is to create a data structure that allows for querying the data. This often involves creating vector embeddings, which are numerical representations of the meaning of your data. Metadata strategies are also used to make it easy to accurately find contextually relevant data.
If the additional context is crucial for understanding the core data and is frequently used in queries, it might make sense to include it directly in the CSV file. This could make the data more self-contained and potentially simplify the querying process.
On the other hand, if the additional context is auxiliary information that is not always relevant for queries, it might be better to include it as metadata. This could keep the core data more streamlined and make the metadata available when needed.
In the end, the choice between adding context in the CSV file or as metadata should be guided by considerations about the nature of the data, the types of queries that will be performed, and the specific requirements of the RAG pipeline.
Unfortunately, the provided extracts do not contain specific examples or guidelines about when to use each approach. For more detailed advice, you might want to consult the documentation of the specific tools or libraries you are using, or ask for advice in relevant forums or communities.