#how to overcome open ai api character limit

1 messages · Page 1 of 1 (latest)

rare plaza
#

hi, i'm working with the open ai api and i'm pretty sure that the text i'm inputting is over 4000 characters. is there any way to send the text in chunks? i'm trying to create a summarizer

sly lichenBOT
#

Hi @rare plaza. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Yes, you can overcome the character limit by splitting your text into smaller chunks and processing each chunk separately. This approach is particularly useful when you're working with large documents and want to create a summarizer.

Here's a general approach based on the information from the OpenAI Cookbook (https://cookbook.openai.com/examples/summarizing_long_documents):

  1. Split your document into smaller pieces. The size of these pieces will depend on the character limit of the API you're using. For instance, if you're using the GPT-4 API, the limit is 4096 characters.

  2. Process each piece separately with the API. You can do this by sending each piece as a separate request to the API.

  3. Combine the results. After you've processed all the pieces, you can combine the results to get a summary of the entire document.

Here's a Python code snippet that demonstrates how to split a document into smaller pieces:

import os
from typing import List, Tuple, Optional
from openai import OpenAI
import tiktoken
from tqdm import tqdm

# open dataset containing part of the text of the Wikipedia page for the United States
with open("data/artificial_intelligence_wikipedia.txt", "r") as file:
    artificial_intelligence_wikipedia_text = file.read()

# load encoding and check the length of dataset
encoding = tiktoken.encoding_for_model('gpt-4-turbo')
len(encoding.encode(artificial_intelligence_wikipedia_text))

# define a simple utility to wrap calls to the OpenAI API
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def get_chat_completion(messages, model='gpt-4-turbo'):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message.content
#

define some utilities to chunk a large document into smaller pieces

def tokenize(text: str) -> List[str]:
encoding = tiktoken.encoding_for_model('gpt-4-turbo')
return encoding.encode(text)


Remember to replace `"data/artificial_intelligence_wikipedia.txt"` with the path to your document and `'gpt-4-turbo'` with the model you're using. Also, make sure to set your OpenAI API key in the environment variable `OPENAI_API_KEY`.

Please note that the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the `index` field.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------