#How to use customized embeddings from OpenAI Cookbook?

1 messages · Page 1 of 1 (latest)

ember vortex
#

Context :
I am following this cookbook from OpenAI for customizing embeddings :
openai-cookbook/blob/main/examples/Customizing_embeddings.ipynb

Here is my use case :
We have huge number of private documents (~400000) which is filled with organization specific words and knowledge.
We have already divided documents into chunks and converted each chunk to embedding and stored it in a VectorStore (OpenSearch).
But while retrieving a lot of irrelevant documents are found similar to query.
In order to circumvent that, we are planning to create bias matrix as described in OpenAI Cookbook for customizing embeddings.

Question :
The bias matrix (it is mentioned as best matrix in code) generated as part of the cookbook is of shape : (1536,2048)
The embedding generated using ADA model has shape : (1,1536)
What should be my precise next steps ?

Possible Next Steps :
Step 1 : Multiple embedding of every chunk (which is shape of (1,1536) with bias matrix (which is shape of (1536,2048)) that results into matrix of shape (1,2048).

Step 2 : Insert result of multiplication (i.e. (1,2048) matrix) to vector store embedding against chunk.

Step 3 : While querying vector store, follow the Step 1 and Step 2 again for embedding generated for query

midnight groveBOT
#

Hi @ember vortex. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Your proposed steps seem to be in line with the information provided in the extracts. Here's a more detailed breakdown based on the information available:

  1. Customize Embeddings: As per the OpenAI Cookbook (https://cookbook.openai.com/examples/Customizing_embeddings.ipynb), you can train a custom matrix (bias matrix) to multiply your embedding vectors by, in order to get new customized embeddings. This custom matrix will help emphasize the features relevant to your training labels.
# Assuming you have your custom matrix `bias_matrix` and your embeddings `embeddings`
custom_embeddings = np.matmul(embeddings, bias_matrix)
  1. Store Custom Embeddings: Once you have your customized embeddings, you can store them in your VectorStore (OpenSearch).

  2. Querying: When querying the VectorStore, you should also transform the query's embedding using the same bias matrix before performing the search. This ensures that the query and the stored embeddings are in the same transformed space, improving the relevance of the search results.

Remember, the effectiveness of this approach will depend on the quality of your training data and the relevance of the features it emphasizes for your specific use case.

Also, keep in mind that the bias matrix shape (1536,2048) you mentioned seems to be specific to the example in the OpenAI Cookbook. The shape of your bias matrix might differ based on your specific use case and the dimensionality of your embeddings.

#

Lastly, it's important to note that the information provided is based on the extracts given and might not cover all aspects of your specific use case. For more detailed guidance, you might want to refer to the full documentation and examples provided by OpenAI.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: