#File Search Assistants API

1 messages · Page 1 of 1 (latest)

fickle patio
#

I have a file that is 56'000 Tokens long! I've uploaded it to a vector storage. I want to create a Assistant that uses as less tokens as possible, when doing the file search.

brave sundialBOT
#

Hi @fickle patio. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

To create an Assistant that uses as few tokens as possible when performing a file search, you can follow a strategy similar to the one described in a forum post (https://community.openai.com/t/783317) on the OpenAI community. This strategy involves creating embeddings for your file content and storing them locally. When a user asks a question, you create an embedding for the question, perform a vector search on your local embedding store, and retrieve relevant context. This context, along with the question, is added to the thread, and the thread is executed to retrieve the result.

Here's a brief outline of the workflow:

  1. Upload your file to OpenAI's files API and attach the file ID to the vector store.
  2. Extract the file content on a page-by-page basis and create embeddings for the pages via the embedding API. Ensure that there is an overlap across pages.
  3. Save the embeddings to a local database table along with the text.
  4. When a user asks a question, create an embedding for the question.
  5. Perform a vector search on your local embedding store and retrieve relevant context (with a distance < 0.6).
  6. Add the context (if found) and the question to the thread.
  7. Execute the thread and retrieve the result.

This approach can help minimize token usage and provide a temporary solution to any vector store issues. Once the vector store issue is resolved, you can disable the module for doing the manual embedding and rely on OpenAI to retrieve the relevant context.

Remember, each file can be at most 512 MB in size and have a maximum of 5,000,000 tokens as per the OpenAI documentation (https://platform.openai.com/docs/assistants/deep-dive/creating-assistants).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@fickle patio kapa.ai is still learning and improving, please let us know how it did by reacting below

fickle patio
#

@brave sundial I mean on the Website. I only have three options. Tokens per chunk, overlap and max chunk outputs!

brave sundialBOT
#

Let me check the knowledge base for you. One moment! :books:

#

When using the file_search tool on the OpenAI website, you have three options to control the token usage:

  1. Tokens per chunk (max_chunk_size_tokens): This is the maximum number of tokens in each chunk. The default value is 800. The minimum value is 100 and the maximum value is 4096. This value determines how your file is divided into chunks for processing. Smaller chunks will use fewer tokens but may not capture the full context of the information in the file.

  2. Overlap (chunk_overlap_tokens): This is the number of tokens that overlap between chunks. The default value is 400. The overlap must not exceed half of max_chunk_size_tokens. This overlap can help ensure that the context is not lost between chunks.

  3. Max chunk outputs: This is the maximum number of chunks that the file_search tool will add to the context. By default, the file_search tool adds up to 20 chunks to the context.

To minimize token usage, you could reduce the max_chunk_size_tokens and chunk_overlap_tokens values, but be aware that this could also reduce the quality of the search results, as less context will be available for the model to understand the content of the file.

For more details, you can refer to the OpenAI API documentation (https://platform.openai.com/docs/api-reference) and the OpenAI Help Article (https://help.openai.com/en/articles/8550641-assistants-api-v2-faq#h_0c74296d66).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: