I am analysing a large dataset of concepts applied to a specific domain.
These concepts may mean different things in general, and mean something very similar when applied to specific domains.
For example: “Python” and “Go”, as general terms or specifically applied on the computer science domain.
If i wanted to cluster a large dataset of computer science concepts, by retrieving their embeddings and performing kmeans or a similar algorithm. Does it make sense to pre-process the concepts and add a computer science concept before calling the embedding api?
For the example above: instead of retrieving the embeddings of “Python” and “Go”, would it make sense to retrieve the embeddings of “‘Python’ (Computer Science)” and “‘Go’ (Computer Science)”
Assuming the answer is yes, are there any relevant examples or papers on this topic?