#FAISS index fails to match acronyms

3 messages · Page 1 of 1 (latest)

twin dust
#

(SOLVED)

I have a knowledge base with a lot of arbitrary acronyms which I don't know in advance. When I try to query the FAISS index using an acronym, it fails to match any entries that contain the acronym. I guess because of tokenisation. The model I'm using for embeddings is sentence-transformers/multi-qa-mpnet-base-dot-v1.

Any ideas on an approach that will let me query the index with any acronym?

twin dust
#

Ok two solutions worked for me:

  1. detecting acronyms in the query and encasing them in square brackets before generating embeddings
  2. using a different FAISS index (IndexHNSWFlat rather than IndexIVFFlat)
golden stream