Hi there! I'm having some trouble with paraphrasing long articles using the OpenAI API. I'm currently using the text-davinci-003 model through the Completion API, but I keep getting an error message saying that the maximum context length is exceeded. I was wondering if anyone had any advice on how to best handle paraphrasing long texts, such as blog articles. Is it necessary to break the text into smaller parts, and if so, how can I maintain coherence between the different fragments? Are there any external services that offer API access for paraphrasing that could be integrated into my Python code? Any help or guidance would be greatly appreciated!
#Seeking help with paraphrasing long texts using OpenAI API
41 messages · Page 1 of 1 (latest)
How I did it was break into chunks and assign a text ID to each segment, then you can run queries or prompts to the model by referencing to their IDs.
For example, you can pass all the chunk IDs in a single query to reference the entire article.
@red orbit Could you give me more tips? I'm just getting to know this topic. I have a list of product descriptions. I want to rephrase them into new text.
You say to divide the text into chunks? If you've done something like this before, maybe you can share a piece of code
You can create a dictionary in python giving a key to each of your list. For example, ```my_dict = {}
Add values to the dictionary
my_dict['prod_desc1'] = ["this is product description 1."]
my_dict['prod_desc2'] = [this is product description 2.]
my_dict['prod_desc3'] = [this is product description 3.]
where the lists are your product descriptions. Then, you can loop through my_dict to concatenate all the product descriptions in a single list:for key in my_dict:
all_descriptions += my_dict[key]```. Finally you can append all values into a single string and send an API call to the model to summarize the string.
You can use this function to split a long 'string' into chunks with a preset length 'chunk_size' and assign a chunk ID to each text segment: ```def split_string(string, chunk_size):
# Split the string into chunks
chunks = [string[i:i+chunk_size] for i in range(0, len(string), chunk_size)]
# Add an ID to each chunk
chunk_dict = {}
for i, chunk in enumerate(chunks):
chunk_dict[i+1] = chunk
return chunk_dict```
You are best @red orbit
So you divide this text into chunks and send it one at a time via the api? Then you add the result to the list and finally you have a new description?
What prompt do you send to Openai?
I don't get it
# Split the string into sentences
sentences = nltk.sent_tokenize(string)
# Split the sentences into chunks
chunks = [sentences[i:i + chunk_size] for i in range(0, len(sentences), chunk_size)]
# Add an ID to each chunk
chunk_dict = {}
chunk_ids = []
for i, chunk in enumerate(chunks):
chunk_dict[i + 1] = chunk
chunk_ids.append(i + 1)
return chunk_dict, chunk_ids```
I would send all the chunks through the api and a final prompt asking what you need by referencing the IDs. Something like “textID1: some text, textID2: some text…” and then a prompt like “summarize textID1 and textID2”. I haven’t personally tried it with the api but definitely works with ChatGPT
The Treebank implementation by nltk could definitely be used for a better way.
Now I understand 😁
I do the same on ChatGPT, I give a command at the beginning of the conversation
Message number: 1
<here is the content of the response>
Your second message should look like this:
Message number: 2
<here is the content of the response>
All subsequent messages should always be numbered.
Message number: 34
<here is the content of the response>"```
But the API has no thread memory, so you can't refer to the previous messages
btw. its just happened to me? openai.error.RateLimitError: The server is currently overloaded with other requests. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if the error persists.
Welcome to OpenAI's home for real-time and historical data on system performance.
you can have it retain information by either fine-tuning the model or store the conversations in a database and generate some code through the api to interact with the database. For the error, do you have a paid subscription with them?
It's actually great because I have all my product descriptions saved in the database
Hmm, so maybe there is no need to send a fragments of text via prompt
hi @raven crescent @red orbit
is it possible to train model with text content?
So to have thread memory in the API, all I have to do is save logs from prompts and their response? Then I will send logs in database format with each query sent?
I just wrote this.
okay, thanks
and how can I generate a long article which contains more than 1200 words via api call?
I tried this but not working:
"Generate a very long article about XXX of at least 1200 words"
@raven crescent
I think it's not that way
is it impossible?
Nothing is impossible 😁
I just learned that you can send data in a different format than text, so I'm the beginning
But I think it gives a lot of possibilities now
I'm on a free trial, but will switching to paid subscriptions change anything?
@raven crescent I used to get that error when I didn't have payment info. I added a credit card and voila! it never complained.
Which model are you using? I just changed from DaVinci 03 to 02 and it works too
Enough for my tests
I’m using 3 fined tuned to my use case
This was not solved? How do I make GPT3 API remember the previous prompt?
You have to send the previous ones along with every new prompt