#clarification on embeddings and data retrieval

1 messages · Page 1 of 1 (latest)

spring blaze
#

hello community!
I have sort of a dumb confusion and i hope I can clarify this somehow.
i'm using lanchain and opensearch for my project and openai embeddings to embed my data:
`db = OpenSearchVectorSearch.from_documents(
langchain_docs,
embedding=embeddings,
opensearch_url="http://localhost:9200",
index_name="netskope_docs_langchain",

)`

here is where the confusion lies - my understanding is that i can do the embeddings only once so basically i will use the openai api when i do the initial embedding calculation and when i write them to the vector store
the question is - every time i retrieve something from the vector store, do I leverage openai api or it all happens locally ?

thank you!

junior basin
#

looks like it may use OpenAI API if you pass in a text to search into a vector database (see here: https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/opensearch#similarity_search-using-painless-scripting)

OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0. OpenSearch is a distributed search and analytics engine based on Apache Lucene.

fresh seal
spring blaze
#

Thank you both!
So to summarize my understanding:

  • at initial adding of vectors to vector store, openai api is used for vector calculation
  • at any subsequent retrieval based on a query, the openai api is also for query embedding calculation
  • vector store provides the capability to retrieve the most relevant documents
junior basin
# fresh seal The whole point of using a vector store / db, is so that you can "store" your ve...

I think the thing for him is that he's probably gonna use the OpenAI API to embed raw text if that is his input (which will cost him $$$). Otherwise if he has the embeddings on hand, it's up to the local storage of the DB to handle the retrieval. That said, I haven't used langchain, so I'd have to deep dive the code and documentation to see if they can input embeddings as queries to their DB.