I did not write the python libraries for the openai package. I just use them. They keep accumulating the history on the client side instead of on the server side such that they would not have to retokenize everything just for short follow up questions. Thus the history, when it gets to about 1/33rd the supported context window size, causes the 30000 token per limit error. Is the following not valid for my "send message and wait for a reply" function? Error handling was removed to shorten this.
client.beta.threads.messages.create(thread_id=thread_id, role="user", content=user_input)
run = client.beta.threads.runs.create(thread_id=thread_id, assistant_id=assistant_id)
while True:
run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run.id)
if run.status == 'completed':
break
messages = client.beta.threads.messages.list(thread_id=thread_id)
for msg in messages.data:
if msg.role == "assistant":
prt(f"[AI] {msg.content[0].text.value}")
prt(f"run.usage = {run.usage}")
break