#How does vector store searching work

1 messages · Page 1 of 1 (latest)

proven lynx
#

I’ve created a vector store containing textbooks for an assistant, When I call the assistant, I give it two things:

  1. The topic to get the content for
  2. A description of in what format it should output the retrieved content (fox example stuff like add bold for key points).

My question is will ChatGPT use the entire prompt for your similarity search, or only the topic?
Like how does it perform the similarity search?

vast boneBOT
#

Hi @proven lynx. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

The process of vector store searching involves several steps:

  1. Indexing: Each chunk of text (in your case, sections of textbooks) is converted into an embedding vector using OpenAI's Embedding. These vectors are saved in the Vector Store for later use in searching. Some metadata, such as the author's name or other pre-computed tags, can also be stored alongside to allow for search customization [source (https://cookbook.openai.com/examples/vector_databases/cassandra_astradb/philosophical_quotes_cql#how-it-works)].

  2. Search: To find a chunk of text similar to the provided search query (in your case, the topic), the query is converted into an embedding vector on the fly. This vector is then used to query the Vector Store for similar vectors, i.e., similar chunks of text that were previously indexed. The key point here is that "chunks of text similar in content" translates, in vector space, to vectors that are metrically close to each other. Thus, vector similarity search effectively implements semantic similarity [source (https://cookbook.openai.com/examples/vector_databases/cassandra_astradb/philosophical_quotes_cql#how-it-works)].

  3. Generation: Given a suggestion (a topic or a tentative quote), the search step is performed, and the first returned results (chunks of text) are fed into an LLM prompt which asks the generative model to invent a new text along the lines of the passed examples and the initial suggestion [source (https://cookbook.openai.com/examples/vector_databases/cassandra_astradb/philosophical_quotes_cql#how-it-works)].

#

As for your question about whether ChatGPT uses the entire prompt for your similarity search or only the topic, it would depend on how you've set up your system. If you're using the entire prompt (topic + format instructions) as your search query, then the similarity search would be based on that entire prompt. However, if you're only using the topic as your search query, then the similarity search would be based on that topic alone.

Remember, the goal of the similarity search is to find chunks of text in your Vector Store that are semantically similar to your search query. So, if your format instructions are not relevant to the semantic content of your textbooks, it might be more effective to use only the topic as your search query.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@proven lynx kapa.ai is still learning and improving, please let us know how it did by reacting below

proven lynx
#

@vast bone my question is the entire prompt fed or only the topic part ?

vast boneBOT
#

Let me check the knowledge base for you. One moment! :books:

#

The information provided in the knowledge sources does not specify whether the entire prompt or only the topic part is used for the similarity search in the vector store. The implementation details can vary based on how you've set up your system.

However, in general, the part of the prompt used for the similarity search should be the part that contains the semantic content you're interested in retrieving from your vector store. If the format instructions do not contribute to this semantic content, it might be more effective to use only the topic part as your search query.

For more detailed information on how to work with large language models and control their outputs, you can refer to the OpenAI Cookbook's guide on this topic (https://cookbook.openai.com/articles/how_to_work_with_large_language_models#completion-prompt-example).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: