#Similarity search of a pdf file

7 messages · Page 1 of 1 (latest)

tight edge
#

I want to create a program to do a similarity search of any question relating to a specific pdf file. I've extracted all the data from the file , and embedded it and stored it in a list called embed = []. I've also taken the query and embedded it. How do i a similarity search now

waxen oar
#

Let's suggest you make a search of "fruit".
You should embed the word (or sentence) of "fruit" too. Then you compare its cos similarity with every array in your list.(You can use function to do this like the module "compute-cosine-similarity" in node js.) Finally you can sort them from high to low to decide which of the items in your list is the one most similar to the word "fruit".

tight edge
#

Now let's say i get the 5 top most relevant embeds

#

How do i retrieve the text related to that embed

waxen oar
#

You can easily get the item related to the embeds by using index