#How to make sure indexer is working?

1 messages · Page 1 of 1 (latest)

summer isle
#

Hello, I have created a RAG on Azure with a Blob storage as well as an indexer that goes through the documents i have uploaded. The documents are PDF files with plain texts only. The titles on the files are a bit similar to each othe. Some titles explicitly explains what the document is referring to, while others (e.g. books) do not. I am wondering how i can make sure in the best way possible that the RAG (indexer) is going through all documents and retrieving the most relevant?

Should i perhaps change the title on the documents or should i add descriptions in the beginning of the documents explaining the content? Or do i need to add another file explaining shortly what each document is explaining?

Thanks in advance!

junior cargo
# summer isle Hello, I have created a RAG on Azure with a Blob storage as well as an indexer t...

Here are some Microsoft resources that can help you refine your RAG setup with Azure Blob Storage and an indexer:

Ensuring your RAG system retrieves the most relevant documents efficiently depends on how your indexer interprets metadata and document structure.

Here are a few ways to improve relevance:

Enhance Metadata and Titles

  • Titles play a significant role in indexing, so if some are unclear, renaming them with more descriptive titles can help.
  • Consider adding document metadata (if your indexer supports it). Azure Search allows for custom metadata fields, where you can provide additional context such as summaries, categories, or keywords.

Add Summaries or Descriptions Inside Documents

  • Prepending a short summary or descriptive introduction at the start of each document can help the indexer understand relevance.
  • If books have ambiguous titles, including a table of contents or a brief synopsis within each PDF could enhance retrievability.

Use an External Mapping File

  • If modifying existing documents is difficult, consider creating a separate structured file (e.g., JSON or CSV) mapping titles to descriptions.
  • You could store this in Azure Blob Storage and reference it within your RAG pipeline, ensuring your queries leverage richer context.

Optimize Indexing Settings

  • Experiment with semantic ranking in Azure AI Search, which enhances retrieval quality by prioritizing contextually relevant documents.