clarification on embeddings and data retrieval | Learn AI Together | Page 1

spring blaze Jun 26, 2023, 2:17 PM

#

hello community!
I have sort of a dumb confusion and i hope I can clarify this somehow.
i'm using lanchain and opensearch for my project and openai embeddings to embed my data:
`db = OpenSearchVectorSearch.from_documents(
langchain_docs,
embedding=embeddings,
opensearch_url="http://localhost:9200",
index_name="netskope_docs_langchain",

)`

here is where the confusion lies - my understanding is that i can do the embeddings only once so basically i will use the openai api when i do the initial embedding calculation and when i write them to the vector store
the question is - every time i retrieve something from the vector store, do I leverage openai api or it all happens locally ?

thank you!

junior basin Jun 27, 2023, 12:05 AM

#

looks like it may use OpenAI API if you pass in a text to search into a vector database (see here: https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/opensearch#similarity_search-using-painless-scripting)

OpenSearch | 🦜️🔗 Langchain

OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0. OpenSearch is a distributed search and analytics engine based on Apache Lucene.

fresh seal Jun 27, 2023, 5:12 AM

#

spring blaze hello community! I have sort of a dumb confusion and i hope I can clarify this s...

The whole point of using a vector store / db, is so that you can "store" your vectors. While LangChain paradigm feels like you're instantiating it every time, when you use the actual system, you don't repeat calculation of vectors

spring blaze Jun 27, 2023, 5:23 AM

#

Thank you both!
So to summarize my understanding:

at initial adding of vectors to vector store, openai api is used for vector calculation
at any subsequent retrieval based on a query, the openai api is also for query embedding calculation
vector store provides the capability to retrieve the most relevant documents

junior basin Jun 27, 2023, 4:43 PM

#

fresh seal The whole point of using a vector store / db, is so that you can "store" your ve...

I think the thing for him is that he's probably gonna use the OpenAI API to embed raw text if that is his input (which will cost him $$$). Otherwise if he has the embeddings on hand, it's up to the local storage of the DB to handle the retrieval. That said, I haven't used langchain, so I'd have to deep dive the code and documentation to see if they can input embeddings as queries to their DB.

#clarification on embeddings and data retrieval