#How can we read images/tables in PDF and csv/excel files using langchain?

7 messages · Page 1 of 1 (latest)

slim temple
#

Is there a way to read images/tables that are in PDF, CSV, EXCEL files so that we can implement RAG and ask questions about it?

torpid orbit
#
  1. Load with PyPDF
  2. Split document using RecursiveTextSplitter
  3. Store them in a vector store (FAISS, Chroma, etc) with embeddings
  4. Retrieve from the vector store
slim temple
torpid orbit
#

you can convert the pdf into markdown then use markdown document loaders, i dont know about the tables part, but it should work for the images

#

yes i can confirm it work with tables