#Migrate database to vector db or keep two in sync?

6 messages · Page 1 of 1 (latest)

wispy anchor
#

I want to make my database searchable with a similarity search. I currently have a list of service offerings in mongodb, and I'm considering using pinecone or something similar to play around with langchain.

What's the standard practice? Make the vector database the single-source-of-truth or keep it alongside your main one?

low merlin
#

I'd keep your mongoDB as the source of truth. Remember that vector DBs will take your embeddings but those embeddings are built off of pretrained language models (ie BERT or OpenAI's Embeddings). As such, each is going to have a maximum context/input length it can handle. So if you have a document in mongodb that is longer than that, it will have to be broken up before being passed into the embedding, meaning that your 1 entry in mongoDB is now multiple in your vectorDB

#

I'm not sure if pinecone does this for you but keep track of where you text maps to its embeddings

wispy anchor
#

Hi Ghost, thanks for the quick reply. So you would recommend a one-to-many relationship between my mongodb documents and the embeddings? Each "service" item in my db has multiple sections, so maybe each could be its own embedding. I suppose in my vector db I can have metadata such as an ID etc correct? (sorry totally new to this)

low merlin
#

No problem. I think if you want to start with vector stores, use a free local one like faiss to test out what you'll need to keep track of in your DB

#

while faiss is much more bare bones than most other solutions, you can use it for free and get a better understanding of what you'll need to keep track of when using vector DBs