i have some issues regarding the tokens-calculation in my assistant_api and i need some help urgently!
I have created an assistant based on the GPT-4-1106-preview model with custom knowledge base (currently two pdf files).
Until now i did not have any issues, but since last month, the tokens-calculation regarding my input (question) got so drasticly high that it cost me way too much money.
My normal use case is that i enter an input of max. 150 words and get a response of max. 500 words.
Until now, whenever i asked a question with one sentence (4-5 words) and got the response of like 350-400 words the amount of used tokens was round about 700-900 tokens, which is perfectly fine!
Now if i ask the same question with the same 4-5 words, it counts the input_prompt lenght up to over 30.000 tokens !!! But the output tokens are still calculated accordingly !
So the main problem is that the input_prompt somehow adds thousends of tokens to the actuall question_prompt!