#How to embed tokens instead of sentences with the tokenizer api?

2 messages · Page 1 of 1 (latest)

patent trench
#

I'd love to embed a few sentences with a bunch of non vocabulary words like "LFT_GUG is 5 inches"
In this case LFT_GUG gets broken into multiple characters (due to the nature of the tokenizer) or looses its integrity for cosine similarity.
Would love to see how I can embed tokens instead of sentences so that I can manually make sure such words are not broken down and stay the same in the tokenisation process .
Thanks in advance!

spark mirage
#

Hello