#Sending Large text with 700K tokens using gpt-4o model

1 messages · Page 1 of 1 (latest)

eager crest
#

I am using large text as a knowledgebase with 700K tokens.
I need to get a response based on this knowledgebase by sending to OpenAI api.

so I can split this into 100K tokens of text.

I can send first chunk to OpenAI. then get a response (this is summarized text with only few sentences , so poor knowlegebase).

I doin't think openAI remembers previous sent message.

then how should I send from 2nd chunk?

help me please .

simple fractalBOT
#

Hi @eager crest. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

To handle large texts that exceed the context window of the OpenAI models, you can use a strategy of chunking and summarizing. Here's a step-by-step approach based on the knowledge sources:

  1. Chunk the Text: Divide your large text into smaller chunks that fit within the model's context window. For gpt-4o, the context window is 128k tokens. You can use a function to split the text into manageable pieces. Here's an example of how you might chunk text:

    def chunk_on_delimiter(input_string: str, max_tokens: int, delimiter: str) -> List[str]:
        chunks = input_string.split(delimiter)
        combined_chunks, _, dropped_chunk_count = combine_chunks_with_no_minimum(
            chunks, max_tokens, chunk_delimiter=delimiter, add_ellipsis_for_overflow=True
        )
        if dropped_chunk_count > 0:
            print(f"warning: {dropped_chunk_count} chunks were dropped due to overflow")
        combined_chunks = [f"{chunk}{delimiter}" for chunk in combined_chunks]
        return combined_chunks
    
  2. Summarize Each Chunk: Send each chunk to the model and get a summary. This will help in reducing the size of the text while retaining the essential information. You can use a summarization function like this:

    def summarize(text: str, detail: float = 0, model: str = 'gpt-4-turbo', additional_instructions: Optional[str] = None):
        # Summarizes a given text by splitting it into chunks, each of which is summarized individually.
        # The level of detail in the summary can be adjusted.
        # ...
        response = get_chat_completion(messages, model=model)
        return response
    
  3. Combine Summaries: After summarizing each chunk, you can combine these summaries to form a more comprehensive summary of the entire text.

#
  1. Recursive Summarization: If needed, you can apply recursive summarization to further condense the information. This involves summarizing the summaries until you reach the desired level of detail.

  2. Use Contextual Summarization: If you need to maintain context across multiple chunks, consider using a strategy where you include previous summaries as part of the input for summarizing the next chunk. This can help maintain continuity and context.

For more detailed guidance, you can refer to the OpenAI documentation on managing context for text generation and summarizing long documents.

If you have further questions or need more specific examples, feel free to ask!

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@eager crest Kapa.ai is still learning and improving, please let me know how I did by reacting below.