#Document Chunking & Vectorisation

1 messages · Page 1 of 1 (latest)

bold sapphire
#

the language model gets input text and returns output text
usually the question is input text and output is response text

question-> AI/LLM -> response

when you need to use books to answer the question you provide tet from book with the question for input text
but language models have limited input size so you can not put the whole book in the input
so the book is split into chunks and to answer questions top N chunks are fetched and provided to LLM(GPT)

given chunk1+chunk2+chunk3+chunk4+chunk5+chunk6 answer question -> AI LLM -> response

in order to select which chunks are better fit to the question both are converted (embedded) to numerical form and distance between them are compared

covert urchin