#Assistant's API tokenizer explanation

1 messages · Page 1 of 1 (latest)

idle dome
#

Hey All, I have a question about the Assistant's API (direct me elsewhere if this is not the correct place).
I am trying to understand the tokenizer that is being used (gpt-4-turbo).

I am encountering rate limiting issues, and when I check the usage response, I see the following
Usage(completion_tokens=37, prompt_tokens=17218, total_tokens=17255)
But when I enter the same prompt into the openai tokenizer https://platform.openai.com/tokenizer, i get ~500 tokens.

The prompt is an html extract, but I have cleaned the input => removed additional whitespace and html tags.

I am using the Assistant's API with the file search took enabled. Any ideas on what might be going on? Thanks for your help!

idle dome
#

Ah, it seems I misunderstood how the file_search tool works. The file I uploaded is about 80K tokens (280K chars), so I am assuming that a large chunk of that is being included in the request and causing the prompt token count to balloon.

Is there documentation on how the files are chunked and searched? I am wondering if I can optimize the file to improve the search