We're utilizing the gpt-4-0125-preview model and the new Assistant API for our chatbot development. It prompts users with multiple questions, employing tools like retrieval, code interpreter, and functions to organize data.
Each user interaction generates a new assistant instance with specific instructions. If no thread exists, we create one and associate the user's message. We then initiate a thread run instance to poll for the GPT response. However, we're encountering challenges with the token count hitting the limit (32k), leading to disruptions in the bot's flow and increased costs and function execution.
We're looking for guidance on enhancing our chatbot implementation