#File Search Assistants API
1 messages · Page 1 of 1 (latest)
Hi @fickle patio. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
To create an Assistant that uses as few tokens as possible when performing a file search, you can follow a strategy similar to the one described in a forum post (https://community.openai.com/t/783317) on the OpenAI community. This strategy involves creating embeddings for your file content and storing them locally. When a user asks a question, you create an embedding for the question, perform a vector search on your local embedding store, and retrieve relevant context. This context, along with the question, is added to the thread, and the thread is executed to retrieve the result.
Here's a brief outline of the workflow:
- Upload your file to OpenAI's files API and attach the file ID to the vector store.
- Extract the file content on a page-by-page basis and create embeddings for the pages via the embedding API. Ensure that there is an overlap across pages.
- Save the embeddings to a local database table along with the text.
- When a user asks a question, create an embedding for the question.
- Perform a vector search on your local embedding store and retrieve relevant context (with a distance < 0.6).
- Add the context (if found) and the question to the thread.
- Execute the thread and retrieve the result.
This approach can help minimize token usage and provide a temporary solution to any vector store issues. Once the vector store issue is resolved, you can disable the module for doing the manual embedding and rely on OpenAI to retrieve the relevant context.
Remember, each file can be at most 512 MB in size and have a maximum of 5,000,000 tokens as per the OpenAI documentation (https://platform.openai.com/docs/assistants/deep-dive/creating-assistants).
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@fickle patio kapa.ai is still learning and improving, please let us know how it did by reacting below
@brave sundial I mean on the Website. I only have three options. Tokens per chunk, overlap and max chunk outputs!
Let me check the knowledge base for you. One moment! :books:
When using the file_search tool on the OpenAI website, you have three options to control the token usage:
-
Tokens per chunk (
max_chunk_size_tokens): This is the maximum number of tokens in each chunk. The default value is800. The minimum value is100and the maximum value is4096. This value determines how your file is divided into chunks for processing. Smaller chunks will use fewer tokens but may not capture the full context of the information in the file. -
Overlap (
chunk_overlap_tokens): This is the number of tokens that overlap between chunks. The default value is400. The overlap must not exceed half ofmax_chunk_size_tokens. This overlap can help ensure that the context is not lost between chunks. -
Max chunk outputs: This is the maximum number of chunks that the
file_searchtool will add to the context. By default, thefile_searchtool adds up to 20 chunks to the context.
To minimize token usage, you could reduce the max_chunk_size_tokens and chunk_overlap_tokens values, but be aware that this could also reduce the quality of the search results, as less context will be available for the model to understand the content of the file.