Context :
I am following this cookbook from OpenAI for customizing embeddings :
openai-cookbook/blob/main/examples/Customizing_embeddings.ipynb
Here is my use case :
We have huge number of private documents (~400000) which is filled with organization specific words and knowledge.
We have already divided documents into chunks and converted each chunk to embedding and stored it in a VectorStore (OpenSearch).
But while retrieving a lot of irrelevant documents are found similar to query.
In order to circumvent that, we are planning to create bias matrix as described in OpenAI Cookbook for customizing embeddings.
Question :
The bias matrix (it is mentioned as best matrix in code) generated as part of the cookbook is of shape : (1536,2048)
The embedding generated using ADA model has shape : (1,1536)
What should be my precise next steps ?
Possible Next Steps :
Step 1 : Multiple embedding of every chunk (which is shape of (1,1536) with bias matrix (which is shape of (1536,2048)) that results into matrix of shape (1,2048).
Step 2 : Insert result of multiplication (i.e. (1,2048) matrix) to vector store embedding against chunk.
Step 3 : While querying vector store, follow the Step 1 and Step 2 again for embedding generated for query